Architecture And Documentation Assessment¶
Summary¶
The project has a coherent target architecture. The core decision set in
PLAN.md is sound for this environment: one monorepo,
inventory-first modeling, Ansible for OS convergence, Komodo for Compose
reconciliation, SOPS/age for secrets, Traefik for ingress, Authentik as the
strategic identity layer, and tiered backups. This is a reasonable design for a
single-operator homelab where simplicity, recoverability, and reviewability
matter more than enterprise-scale orchestration.
The issue is not architecture intent. The issue is that historical plans, current-state docs, generated outputs, and live observations are not cleanly separated. Contractors can point to different documents and produce different answers about what is active, retired, planned, or accepted risk.
Source-Of-Truth Model¶
flowchart TD
planDoc[PLAN Historical Spec]
readmeLedger[README Owner TODO]
inventory[Inventory YAML]
generators[Generators]
generatedDocs[Generated Docs]
liveDocs[Live Architecture Docs]
runbooks[Runbooks]
services[Service READMEs]
inventory --> generators
generators --> generatedDocs
generators --> ansibleOutput[Ansible And Monitoring Outputs]
services --> runbooks
liveDocs --> runbooks
planDoc -.status drift.-> readmeLedger
readmeLedger --> liveDocs
Recommended contractor revision: make this model explicit in
docs/index.md or a new architecture note:
| Layer | Should Answer | Current Risk |
|---|---|---|
PLAN.md |
Original intent and acceptance criteria | Still self-identifies as master truth. |
README.md Owner TODO |
Human task ledger and phase completion | More current than PLAN.md, but not framed as the status authority. |
inventory/*.yaml |
Entity state and generated artifacts | Generally strong, but generated docs drift in places. |
docs/architecture/*-live.md |
Current observed network/firewall truth | Some scans are dated; manual updates diverge. |
services/*/README.md |
Stack deployment details | Mature docs exist but MkDocs exposes only a subset. |
Major Documentation Gaps¶
PLAN.md Is Historical But Reads As Current¶
PLAN.md says it is the source of truth and includes unchecked early hotfixes
and acceptance items. Meanwhile, the README owner ledger and security register
show those same areas as closed. See PLAN.md, lines 9-11
and 195-211, README.md, lines 80-132, and
security-register.md, lines 9-19.
Revision: Add a top banner to PLAN.md stating whether it is historical,
then link to the live status authorities. If PLAN.md remains authoritative,
update its checklist state and remove obsolete deferred decisions.
Authentik Status Is Stale In The README¶
The README architecture table marks Authentik as Planned, but ADR-002 is
accepted and service Compose labels show Authentik middleware on most
user-facing infra services. See README.md, lines 7-21,
and adr-002-authentik-universal-sso.md.
Revision: Mark Authentik as live and add a one-line exception note for Komodo native OIDC and Plex.
DNS And PiHole Docs Conflict¶
network.md still says PiHole LXC 104 remains during soak, while
network-live.md says PiHole was destroyed on 2026-06-17 and AdGuard is
authoritative. Inventory also marks blocktopus retired. See
network.md, lines 73-84,
network-live.md, lines 6-9 and
185-194, and blocktopus.yaml,
lines 1-6.
Revision: Rewrite network.md so AdGuard is the current authoritative path
and PiHole is historical. Preserve the old scan only inside a clearly marked
historical section.
Firewall Policy And Live Matrix Need Reconciliation¶
firewall-policy.md states that Management has unrestricted access and verifies
that Management can reach everything. firewall-live.md states that Management
to Appliances, IoT, and Security is blocked. The design doc also says IoT
cannot reach AdGuard unless a DNS exception is added, while network-live.md
says all VLAN DHCP points to AdGuard. See
firewall-policy.md, lines 12-17,
174-178, and 231-237, and
firewall-live.md, lines 118-138.
Revision: Run a fresh UniFi policy scan, then update both files in the same PR. If IoT DNS works through gateway forwarding rather than direct IoT to AdGuard, document that as the actual path.
Generated Host Index Is Not Trustworthy¶
docs/hosts/index.md lists k6-loadtest and sqlserver2022 as active stopped
hosts, but their inventory files mark them retired and record destroy/archive
notes. See docs/hosts/index.md, lines 23-25,
k6-loadtest.yaml, lines 1-6 and
30-35, and sqlserver2022.yaml,
lines 1-6 and 30-33.
Revision: Regenerate or fix render-doc-stubs.py so index status comes
from inventory, and add a generated-warning header if the file is generated.
Service Documentation Is Underexposed In MkDocs¶
docs/services/index.md exposes only AdGuard as a first-class service page and
points readers to services/*/README.md for the rest. mkdocs.yml likewise
lists only AdGuard under Services, even though Traefik, Komodo, ARA, monitoring,
Wazuh, Homepage, and Authentik outpost are operational services. See
docs/services/index.md, lines 1-12, and
mkdocs.yml, lines 83-85.
Revision: Generate service docs from services/*/README.md or add nav
entries for every live stack. At minimum, the service index should list all
current service READMEs and state whether each is live, tiered for backup, and
Authentik-protected.
Security Register Headings Are Misleading¶
docs/security-register.md has an Active Findings section containing closed
findings. See security-register.md, lines 9-19.
Revision: Rename the section to Findings Ledger, or move closed rows to
the closed table and keep active rows only where work remains.
Customer-App Index Is Stale¶
docs/customer-apps/index.md says customer app pages will appear after Phase 1,
but recordurbate-tiktok.md already exists and is in the MkDocs nav. See
customer-apps/index.md, lines 1-8, and
mkdocs.yml, lines 99-101.
Revision: Link the existing page and summarize the ownership boundary: inventoried for awareness, not reconciled by Komodo or Ansible.
Better Architecture Alternatives¶
Keep The Current GitOps Model, But Tighten Boundaries¶
The repo does not need Kubernetes or a larger platform abstraction. Komodo plus Compose is a good fit for the current scale. The stronger alternative is to tighten the existing model:
- Generated files should be clearly generated and always checked in CI.
- Live-state docs should state capture date, validation command, and known stale areas.
- Service docs should be discoverable from MkDocs, even if implementation
details remain in
services/*/README.md. - Security exceptions should be captured as accepted risks with review dates.
Use A Documentation Drift Register¶
Instead of scattering TODOs across old runbooks, add a small drift register under
docs/architecture/ or docs/reviews/ with fields for source, conflicting
source, owner, last validated date, and remediation PR. This would make drift an
operational queue rather than passive prose.
Generate Current-State Matrices From Inventory¶
Host lifecycle, service backup tier, Authentik coverage, exposed ports, and monitoring coverage should be generated or at least validated from structured sources. The current markdown tables are useful, but they have already drifted.