Appendix¶
Methodology¶
This assessment combined static repository review with limited read-only live validation.
Static review covered:
- Project planning and status docs.
- Architecture, ADRs, security register, and runbooks.
- Inventory schemas, host/customer/app/appliance inventory, and generated docs.
- Ansible playbooks, roles, generated inventory, and generator scripts.
- Compose stacks, service READMEs, monitoring configs, and backup manifests.
- GitHub Actions workflows and local guardrails.
- SOPS configuration and secret-handling scripts.
Live validation covered:
- SSH to
infra-servicesusing the documented Cursor alias. - Container status with
docker ps. - Listening TCP sockets with
ss -ltnp. - Workstation TCP reachability to selected published ports.
- Live backup manifest listing under
/opt/homelab/services. - AdGuard DNS resolution for selected
*.infra.realemail.appnames. - GitHub workflow inventory with
gh workflow list.
No destructive commands were run. No secrets were intentionally printed or recorded in this report.
Live Validation Notes¶
| Check | Result |
|---|---|
infra-services SSH |
Succeeded. |
| Live containers | Most service containers running; wazuh-dashboard restarting. |
| Listening ports | 8000, 8080, 8081, 3100, 9090 bound to all interfaces. |
| Workstation TCP tests | 8000, 8080, 8081, 3100, 9090 reachable on 192.168.6.17. |
| Backup manifests | No live services/wazuh/backup.yml found. |
| DNS rewrites | komodo.infra.realemail.app and adguard.infra.realemail.app resolved to 192.168.6.17. |
| GitHub workflow list | Workflows are active; repository settings and branch protection were not inspected. |
Evidence Index¶
Architecture And Planning¶
| File | Why It Matters |
|---|---|
README.md |
High-level architecture table, repo structure, production branch model, owner TODO ledger. |
PLAN.md |
Original master spec, decisions, entity classes, CI and phase acceptance criteria. |
docs/index.md |
Published docs landing page. |
docs/architecture/adr-001-reverse-proxy.md |
Reverse proxy decision. |
docs/architecture/adr-002-authentik-universal-sso.md |
Universal Authentik requirement and exceptions. |
docs/architecture/proxmox-consolidation.md |
Compute consolidation target and safety notes. |
Network And Firewall¶
| File | Why It Matters |
|---|---|
docs/architecture/network.md |
Intended network and DNS design. |
docs/architecture/network-live.md |
As-built network snapshot and AdGuard cutover status. |
docs/architecture/firewall-policy.md |
Intended ZBF policy and verification checklist. |
docs/architecture/firewall-live.md |
Live zone matrix and WAN ingress posture from UniFi scan. |
inventory/networks.yaml |
Structured VLAN/subnet inventory. |
infra/tailscale/acl.json |
Tailnet groups, ACLs, SSH policy, and tests. |
Inventory And Generators¶
| File | Why It Matters |
|---|---|
inventory/schema/host.schema.json |
Host inventory validation. |
inventory/generators/render-ansible.py |
Inventory-to-Ansible source of truth. |
inventory/generators/render-discovery-inventory.py |
Discovery inventory generation and check-mode issues. |
infra/ansible/inventory/generated.yml |
Generated downstream Ansible inventory. |
infra/ansible/inventory/host_vars/prox.yml |
Root-only Proxmox SSH model. |
docs/hosts/index.md |
Generated or maintained host lifecycle index with drift. |
Ansible And Operations¶
| File | Why It Matters |
|---|---|
infra/ansible/playbooks/site.yml |
Main convergence playbook. |
infra/ansible/playbooks/patch.yml |
Coordinated OS patching wave order. |
infra/ansible/roles/common/tasks/ansible-pull.yml |
Pull-mode GitOps implementation. |
docs/architecture/patching.md |
Patching architecture and observability. |
docs/runbooks/coordinated-os-patching.md |
Live patching operations runbook. |
Services¶
| File | Why It Matters |
|---|---|
services/traefik/compose.yml |
Ingress, Docker socket, and direct 8080 publish. |
services/authentik-outpost/compose.yml |
Local Authentik outpost. |
services/komodo/compose.yml |
GitOps control plane, native OIDC, and Docker socket. |
services/ara/compose.yml |
ARA direct port, permissive hosts, and CORS. |
services/homepage/compose.yml |
Homepage and Docker socket mount. |
services/monitoring/compose.yml |
Prometheus/Grafana/Loki/cAdvisor exposure and auth settings. |
services/adguard/compose.yml |
DNS binding and Unbound relationship. |
services/wazuh/compose.yml |
Wazuh state, agent ports, dashboard routing, and volumes. |
Monitoring¶
| File | Why It Matters |
|---|---|
monitoring/prometheus/prometheus.yml |
Scrape jobs and alertmanager link. |
monitoring/targets/nodes.yml |
Generated monitored node targets. |
monitoring/prometheus/alerts/patching.yml |
OS patch alert rules. |
monitoring/alertmanager/alertmanager.yml |
Notification routes and receivers. |
monitoring/promtail/promtail-config.yml |
Docker/syslog scraping and positions path. |
Backups And DR¶
| File | Why It Matters |
|---|---|
backups/policies.yaml |
Backup tiers, external critical data, and offsite target. |
services/traefik/backup.yml |
Traefik config and ACME backup path. |
services/adguard/backup.yml |
AdGuard backup path drift. |
docs/runbooks/restore.md |
Per-service restore procedures. |
docs/runbooks/dr-from-zero.md |
Disaster recovery order and exclusions. |
infra/ansible/roles/backup-client/ |
Restic role implementation. |
Security¶
| File | Why It Matters |
|---|---|
.sops.yaml |
SOPS recipient scope. |
docs/runbooks/secrets.md |
Host age-key handling and rotation guidance. |
scripts/check-sops-encryption.py |
SOPS metadata-only check. |
scripts/render-komodo-compose-env.sh |
Komodo plaintext secret rendering path. |
scripts/update-backup-sops.sh |
Backup secret temp-file handling. |
services/wazuh/config/wazuh_indexer/internal_users.yml |
Demo Wazuh users. |
services/wazuh/config/wazuh_indexer/wazuh.indexer.yml |
Disabled disk thresholds. |
docs/runbooks/cloudflare-pages.md |
Public docs accepted risk. |
CI And Quality Gates¶
| File | Why It Matters |
|---|---|
.github/workflows/lint.yml |
Main CI, generator checks, SOPS checks, self-hosted PR runners. |
.github/workflows/secrets-scan.yml |
Gitleaks and TruffleHog on self-hosted runners. |
.github/workflows/komodo-deploy.yml |
LAN-only deploy relay and ignored Grafana paths. |
.github/workflows/docs.yml |
MkDocs strict build and Cloudflare Pages deploy. |
.pre-commit-config.yaml |
Local guardrails and tool version choices. |
Open Questions For Owner Or Contractors¶
- Are external pull requests possible, or is every PR from a trusted private actor? This affects runner remediation urgency but does not eliminate the self-hosted runner trust issue.
- Should Wazuh be tier 1, tier 2, or rebuild-only? The answer changes backup cost, retention, and restore requirements.
- Should the public docs site remain accepted risk after this report, or should Cloudflare Access/private docs split be reconsidered?
- Which direct machine APIs genuinely need LAN reachability rather than Docker network or Traefik-only access?
- Is broad
tag:serverTailscale reachability intentional for operations, or can it be reduced to explicit flows? - Should
PLAN.mdremain a living spec, or should it be frozen as historical context with README/journal/live docs becoming current truth?
Suggested Remediation Sequence¶
flowchart TD
startNode[Start] --> portHardening[Harden Direct Ports]
portHardening --> wazuhStabilize[Stabilize And Back Up Wazuh]
wazuhStabilize --> runnerIsolation[Isolate PR Runners]
runnerIsolation --> sopsSplit[Split SOPS Recipients]
sopsSplit --> docReconcile[Reconcile Source Of Truth Docs]
docReconcile --> generatorCi[Harden Generators And CI]
generatorCi --> restoreDrills[Run Restore Drills]
restoreDrills --> acceptance[Contractor Acceptance]
Recommended first PRs:
- Remove or restrict direct host publishes for ARA, Prometheus, Loki, cAdvisor, and Traefik metrics.
- Add Wazuh backup/restore/retention and fix dashboard stability.
- Move PR CI to isolated runners.
- Fix SOPS validation and secret-rendering scripts.
- Reconcile
PLAN.md, README, DNS/firewall docs, host index, and service docs.
Review Limitations¶
- No fresh UniFi API scan was run during this assessment.
- No live Tailscale ACL state was queried.
- No GitHub repository settings, branch protection rules, or runner host hardening settings were inspected.
- No restore jobs were executed.
- No service credentials or secret plaintext values were inspected.
These limitations do not invalidate the confirmed findings, but they should be resolved before the final contractor acceptance review.