Skip to content

Architecture And Documentation Assessment

Summary

The project has a coherent target architecture. The core decision set in PLAN.md is sound for this environment: one monorepo, inventory-first modeling, Ansible for OS convergence, Komodo for Compose reconciliation, SOPS/age for secrets, Traefik for ingress, Authentik as the strategic identity layer, and tiered backups. This is a reasonable design for a single-operator homelab where simplicity, recoverability, and reviewability matter more than enterprise-scale orchestration.

The issue is not architecture intent. The issue is that historical plans, current-state docs, generated outputs, and live observations are not cleanly separated. Contractors can point to different documents and produce different answers about what is active, retired, planned, or accepted risk.

Source-Of-Truth Model

flowchart TD
  planDoc[PLAN Historical Spec]
  readmeLedger[README Owner TODO]
  inventory[Inventory YAML]
  generators[Generators]
  generatedDocs[Generated Docs]
  liveDocs[Live Architecture Docs]
  runbooks[Runbooks]
  services[Service READMEs]

  inventory --> generators
  generators --> generatedDocs
  generators --> ansibleOutput[Ansible And Monitoring Outputs]
  services --> runbooks
  liveDocs --> runbooks
  planDoc -.status drift.-> readmeLedger
  readmeLedger --> liveDocs

Recommended contractor revision: make this model explicit in docs/index.md or a new architecture note:

Layer Should Answer Current Risk
PLAN.md Original intent and acceptance criteria Still self-identifies as master truth.
README.md Owner TODO Human task ledger and phase completion More current than PLAN.md, but not framed as the status authority.
inventory/*.yaml Entity state and generated artifacts Generally strong, but generated docs drift in places.
docs/architecture/*-live.md Current observed network/firewall truth Some scans are dated; manual updates diverge.
services/*/README.md Stack deployment details Mature docs exist but MkDocs exposes only a subset.

Major Documentation Gaps

PLAN.md Is Historical But Reads As Current

PLAN.md says it is the source of truth and includes unchecked early hotfixes and acceptance items. Meanwhile, the README owner ledger and security register show those same areas as closed. See PLAN.md, lines 9-11 and 195-211, README.md, lines 80-132, and security-register.md, lines 9-19.

Revision: Add a top banner to PLAN.md stating whether it is historical, then link to the live status authorities. If PLAN.md remains authoritative, update its checklist state and remove obsolete deferred decisions.

Authentik Status Is Stale In The README

The README architecture table marks Authentik as Planned, but ADR-002 is accepted and service Compose labels show Authentik middleware on most user-facing infra services. See README.md, lines 7-21, and adr-002-authentik-universal-sso.md.

Revision: Mark Authentik as live and add a one-line exception note for Komodo native OIDC and Plex.

DNS And PiHole Docs Conflict

network.md still says PiHole LXC 104 remains during soak, while network-live.md says PiHole was destroyed on 2026-06-17 and AdGuard is authoritative. Inventory also marks blocktopus retired. See network.md, lines 73-84, network-live.md, lines 6-9 and 185-194, and blocktopus.yaml, lines 1-6.

Revision: Rewrite network.md so AdGuard is the current authoritative path and PiHole is historical. Preserve the old scan only inside a clearly marked historical section.

Firewall Policy And Live Matrix Need Reconciliation

firewall-policy.md states that Management has unrestricted access and verifies that Management can reach everything. firewall-live.md states that Management to Appliances, IoT, and Security is blocked. The design doc also says IoT cannot reach AdGuard unless a DNS exception is added, while network-live.md says all VLAN DHCP points to AdGuard. See firewall-policy.md, lines 12-17, 174-178, and 231-237, and firewall-live.md, lines 118-138.

Revision: Run a fresh UniFi policy scan, then update both files in the same PR. If IoT DNS works through gateway forwarding rather than direct IoT to AdGuard, document that as the actual path.

Generated Host Index Is Not Trustworthy

docs/hosts/index.md lists k6-loadtest and sqlserver2022 as active stopped hosts, but their inventory files mark them retired and record destroy/archive notes. See docs/hosts/index.md, lines 23-25, k6-loadtest.yaml, lines 1-6 and 30-35, and sqlserver2022.yaml, lines 1-6 and 30-33.

Revision: Regenerate or fix render-doc-stubs.py so index status comes from inventory, and add a generated-warning header if the file is generated.

Service Documentation Is Underexposed In MkDocs

docs/services/index.md exposes only AdGuard as a first-class service page and points readers to services/*/README.md for the rest. mkdocs.yml likewise lists only AdGuard under Services, even though Traefik, Komodo, ARA, monitoring, Wazuh, Homepage, and Authentik outpost are operational services. See docs/services/index.md, lines 1-12, and mkdocs.yml, lines 83-85.

Revision: Generate service docs from services/*/README.md or add nav entries for every live stack. At minimum, the service index should list all current service READMEs and state whether each is live, tiered for backup, and Authentik-protected.

Security Register Headings Are Misleading

docs/security-register.md has an Active Findings section containing closed findings. See security-register.md, lines 9-19.

Revision: Rename the section to Findings Ledger, or move closed rows to the closed table and keep active rows only where work remains.

Customer-App Index Is Stale

docs/customer-apps/index.md says customer app pages will appear after Phase 1, but recordurbate-tiktok.md already exists and is in the MkDocs nav. See customer-apps/index.md, lines 1-8, and mkdocs.yml, lines 99-101.

Revision: Link the existing page and summarize the ownership boundary: inventoried for awareness, not reconciled by Komodo or Ansible.

Better Architecture Alternatives

Keep The Current GitOps Model, But Tighten Boundaries

The repo does not need Kubernetes or a larger platform abstraction. Komodo plus Compose is a good fit for the current scale. The stronger alternative is to tighten the existing model:

  • Generated files should be clearly generated and always checked in CI.
  • Live-state docs should state capture date, validation command, and known stale areas.
  • Service docs should be discoverable from MkDocs, even if implementation details remain in services/*/README.md.
  • Security exceptions should be captured as accepted risks with review dates.

Use A Documentation Drift Register

Instead of scattering TODOs across old runbooks, add a small drift register under docs/architecture/ or docs/reviews/ with fields for source, conflicting source, owner, last validated date, and remediation PR. This would make drift an operational queue rather than passive prose.

Generate Current-State Matrices From Inventory

Host lifecycle, service backup tier, Authentik coverage, exposed ports, and monitoring coverage should be generated or at least validated from structured sources. The current markdown tables are useful, but they have already drifted.