Skip to content

Appendix

Methodology

This assessment combined static repository review with limited read-only live validation.

Static review covered:

  • Project planning and status docs.
  • Architecture, ADRs, security register, and runbooks.
  • Inventory schemas, host/customer/app/appliance inventory, and generated docs.
  • Ansible playbooks, roles, generated inventory, and generator scripts.
  • Compose stacks, service READMEs, monitoring configs, and backup manifests.
  • GitHub Actions workflows and local guardrails.
  • SOPS configuration and secret-handling scripts.

Live validation covered:

  • SSH to infra-services using the documented Cursor alias.
  • Container status with docker ps.
  • Listening TCP sockets with ss -ltnp.
  • Workstation TCP reachability to selected published ports.
  • Live backup manifest listing under /opt/homelab/services.
  • AdGuard DNS resolution for selected *.infra.realemail.app names.
  • GitHub workflow inventory with gh workflow list.

No destructive commands were run. No secrets were intentionally printed or recorded in this report.

Live Validation Notes

Check Result
infra-services SSH Succeeded.
Live containers Most service containers running; wazuh-dashboard restarting.
Listening ports 8000, 8080, 8081, 3100, 9090 bound to all interfaces.
Workstation TCP tests 8000, 8080, 8081, 3100, 9090 reachable on 192.168.6.17.
Backup manifests No live services/wazuh/backup.yml found.
DNS rewrites komodo.infra.realemail.app and adguard.infra.realemail.app resolved to 192.168.6.17.
GitHub workflow list Workflows are active; repository settings and branch protection were not inspected.

Evidence Index

Architecture And Planning

File Why It Matters
README.md High-level architecture table, repo structure, production branch model, owner TODO ledger.
PLAN.md Original master spec, decisions, entity classes, CI and phase acceptance criteria.
docs/index.md Published docs landing page.
docs/architecture/adr-001-reverse-proxy.md Reverse proxy decision.
docs/architecture/adr-002-authentik-universal-sso.md Universal Authentik requirement and exceptions.
docs/architecture/proxmox-consolidation.md Compute consolidation target and safety notes.

Network And Firewall

File Why It Matters
docs/architecture/network.md Intended network and DNS design.
docs/architecture/network-live.md As-built network snapshot and AdGuard cutover status.
docs/architecture/firewall-policy.md Intended ZBF policy and verification checklist.
docs/architecture/firewall-live.md Live zone matrix and WAN ingress posture from UniFi scan.
inventory/networks.yaml Structured VLAN/subnet inventory.
infra/tailscale/acl.json Tailnet groups, ACLs, SSH policy, and tests.

Inventory And Generators

File Why It Matters
inventory/schema/host.schema.json Host inventory validation.
inventory/generators/render-ansible.py Inventory-to-Ansible source of truth.
inventory/generators/render-discovery-inventory.py Discovery inventory generation and check-mode issues.
infra/ansible/inventory/generated.yml Generated downstream Ansible inventory.
infra/ansible/inventory/host_vars/prox.yml Root-only Proxmox SSH model.
docs/hosts/index.md Generated or maintained host lifecycle index with drift.

Ansible And Operations

File Why It Matters
infra/ansible/playbooks/site.yml Main convergence playbook.
infra/ansible/playbooks/patch.yml Coordinated OS patching wave order.
infra/ansible/roles/common/tasks/ansible-pull.yml Pull-mode GitOps implementation.
docs/architecture/patching.md Patching architecture and observability.
docs/runbooks/coordinated-os-patching.md Live patching operations runbook.

Services

File Why It Matters
services/traefik/compose.yml Ingress, Docker socket, and direct 8080 publish.
services/authentik-outpost/compose.yml Local Authentik outpost.
services/komodo/compose.yml GitOps control plane, native OIDC, and Docker socket.
services/ara/compose.yml ARA direct port, permissive hosts, and CORS.
services/homepage/compose.yml Homepage and Docker socket mount.
services/monitoring/compose.yml Prometheus/Grafana/Loki/cAdvisor exposure and auth settings.
services/adguard/compose.yml DNS binding and Unbound relationship.
services/wazuh/compose.yml Wazuh state, agent ports, dashboard routing, and volumes.

Monitoring

File Why It Matters
monitoring/prometheus/prometheus.yml Scrape jobs and alertmanager link.
monitoring/targets/nodes.yml Generated monitored node targets.
monitoring/prometheus/alerts/patching.yml OS patch alert rules.
monitoring/alertmanager/alertmanager.yml Notification routes and receivers.
monitoring/promtail/promtail-config.yml Docker/syslog scraping and positions path.

Backups And DR

File Why It Matters
backups/policies.yaml Backup tiers, external critical data, and offsite target.
services/traefik/backup.yml Traefik config and ACME backup path.
services/adguard/backup.yml AdGuard backup path drift.
docs/runbooks/restore.md Per-service restore procedures.
docs/runbooks/dr-from-zero.md Disaster recovery order and exclusions.
infra/ansible/roles/backup-client/ Restic role implementation.

Security

File Why It Matters
.sops.yaml SOPS recipient scope.
docs/runbooks/secrets.md Host age-key handling and rotation guidance.
scripts/check-sops-encryption.py SOPS metadata-only check.
scripts/render-komodo-compose-env.sh Komodo plaintext secret rendering path.
scripts/update-backup-sops.sh Backup secret temp-file handling.
services/wazuh/config/wazuh_indexer/internal_users.yml Demo Wazuh users.
services/wazuh/config/wazuh_indexer/wazuh.indexer.yml Disabled disk thresholds.
docs/runbooks/cloudflare-pages.md Public docs accepted risk.

CI And Quality Gates

File Why It Matters
.github/workflows/lint.yml Main CI, generator checks, SOPS checks, self-hosted PR runners.
.github/workflows/secrets-scan.yml Gitleaks and TruffleHog on self-hosted runners.
.github/workflows/komodo-deploy.yml LAN-only deploy relay and ignored Grafana paths.
.github/workflows/docs.yml MkDocs strict build and Cloudflare Pages deploy.
.pre-commit-config.yaml Local guardrails and tool version choices.

Open Questions For Owner Or Contractors

  1. Are external pull requests possible, or is every PR from a trusted private actor? This affects runner remediation urgency but does not eliminate the self-hosted runner trust issue.
  2. Should Wazuh be tier 1, tier 2, or rebuild-only? The answer changes backup cost, retention, and restore requirements.
  3. Should the public docs site remain accepted risk after this report, or should Cloudflare Access/private docs split be reconsidered?
  4. Which direct machine APIs genuinely need LAN reachability rather than Docker network or Traefik-only access?
  5. Is broad tag:server Tailscale reachability intentional for operations, or can it be reduced to explicit flows?
  6. Should PLAN.md remain a living spec, or should it be frozen as historical context with README/journal/live docs becoming current truth?

Suggested Remediation Sequence

flowchart TD
  startNode[Start] --> portHardening[Harden Direct Ports]
  portHardening --> wazuhStabilize[Stabilize And Back Up Wazuh]
  wazuhStabilize --> runnerIsolation[Isolate PR Runners]
  runnerIsolation --> sopsSplit[Split SOPS Recipients]
  sopsSplit --> docReconcile[Reconcile Source Of Truth Docs]
  docReconcile --> generatorCi[Harden Generators And CI]
  generatorCi --> restoreDrills[Run Restore Drills]
  restoreDrills --> acceptance[Contractor Acceptance]

Recommended first PRs:

  1. Remove or restrict direct host publishes for ARA, Prometheus, Loki, cAdvisor, and Traefik metrics.
  2. Add Wazuh backup/restore/retention and fix dashboard stability.
  3. Move PR CI to isolated runners.
  4. Fix SOPS validation and secret-rendering scripts.
  5. Reconcile PLAN.md, README, DNS/firewall docs, host index, and service docs.

Review Limitations

  • No fresh UniFi API scan was run during this assessment.
  • No live Tailscale ACL state was queried.
  • No GitHub repository settings, branch protection rules, or runner host hardening settings were inspected.
  • No restore jobs were executed.
  • No service credentials or secret plaintext values were inspected.

These limitations do not invalidate the confirmed findings, but they should be resolved before the final contractor acceptance review.