Skip to content

Contractor Assessment Report

Date: 2026-06-28 Repository: notarealemail/homelab Assessment stance: senior engineering due diligence for owner acceptance of contracted work.

Executive Summary

The project has a strong foundation: clear GitOps intent, inventory-driven generation, SOPS-encrypted secrets, Ansible convergence, Komodo-managed Compose stacks, Authentik-centered ingress, and a substantial runbook culture. The implementation is more mature than a typical homelab repo, but it is not yet at the level where the docs, automation, runtime posture, and recovery evidence can be treated as uniformly production-ready.

The highest-priority contractor revisions are operational security and recovery hardening. Live validation confirmed that several internal observability and ARA ports are bound on all interfaces and reachable from the operator workstation, creating paths around the intended Traefik/Auth layer. Wazuh is deployed and marked complete in project status, but the live dashboard is restarting and no live backup.yml exists for its stateful volumes. Several secret-handling scripts stage decrypted data in process arguments or predictable /tmp paths. Pull-request CI also executes repository code on self-hosted runners, which is a material trust-boundary issue.

The largest documentation risk is source-of-truth drift. PLAN.md still calls itself the source of truth and contains early unchecked acceptance items, while the README, security register, live network docs, and owner TODO table show many of those items as complete. Network, firewall, host, and service docs also mix historical snapshots with current operational truth. This makes the repo harder to hand to contractors because multiple files can justify different actions.

Table Of Contents

Top Revisions To Send Back

  1. Close direct service exposure on infra-services: remove unnecessary host publishes, bind machine-only APIs to loopback or a management IP, and add DOCKER-USER allowlists for ports that must remain reachable.
  2. Treat Wazuh as unfinished until dashboard stability, backup policy, restore procedure, retention, demo-user cleanup, and disk-watermark controls are in place.
  3. Move PR validation off persistent self-hosted homelab runners or isolate it with ephemeral, unprivileged runner pools that have no LAN reach.
  4. Split broad SOPS automation access into per-host or per-domain recipients, then rotate the current shared automation key.
  5. Fix secret-rendering scripts so decrypted material is passed through stdin or restrictive temp files and outputs are written atomically with mode 0600.
  6. Reconcile project source-of-truth docs: make PLAN.md historical or add a live status overlay, refresh DNS/firewall docs, regenerate host indexes, and expose all mature service docs in MkDocs.
  7. Harden CI and generators: make SOPS checks detect plaintext values, make generator --check modes fail on missing outputs, and pin CI tool versions.
  8. Add restore-test evidence for tier-1 and external backups, including Authentik DB, HAOS, Harbor, Komodo, Traefik, and any Wazuh tier assignment.

Evidence Confidence

Area Confidence Basis
Repo architecture and docs High Static read of planning docs, ADRs, inventory, generated docs, and runbooks.
Compose/service posture High Static read plus live docker ps and listening-port checks on infra-services.
Direct port exposure High Live ss -ltnp showed 0.0.0.0/[::] listeners; workstation TCP tests succeeded for 8000, 8080, 8081, 3100, and 9090.
Backup manifest gaps High Static repo check plus live /opt/homelab/services/*/backup.yml listing confirmed no Wazuh manifest.
GitHub settings and branch protection Medium Workflow files and gh workflow list reviewed; repository settings were not inspected.
UniFi firewall and VLAN state Medium Repo docs and prior scan outputs reviewed; no fresh UniFi API scan was run in this assessment.
Tailscale ACL runtime state Medium ACL file reviewed; live tailnet policy was not queried.

System Overview

flowchart LR
  inventoryYaml[Inventory YAML] --> generators[Generators]
  generators --> ansibleInventory[Ansible Inventory]
  generators --> prometheusTargets[Prometheus Targets]
  generators --> homepageConfig[Homepage Config]
  gitRepo[GitHub Repo] --> ansiblePull[Ansible Pull]
  gitRepo --> komodoRelay[GitHub Actions Relay]
  komodoRelay --> komodo[Komodo]
  komodo --> composeStacks[Compose Stacks]
  composeStacks --> traefik[Traefik]
  traefik --> authentikOutpost[Authentik Outpost]
  composeStacks --> monitoring[Monitoring Stack]
  composeStacks --> backups[Restic Backups]

Risk Heatmap

Domain Current Risk Main Driver
Runtime service exposure High Direct host-published observability and ARA ports are live and reachable.
CI and runner trust High PR workflows run repo code on self-hosted runners.
Secret blast radius High Broad age recipients and global host automation key.
Backup and DR confidence Medium-High Tier policies exist, but Wazuh and external restore tests are incomplete.
Documentation correctness Medium-High Multiple source-of-truth layers contradict each other.
Observability quality Medium Stack exists, but health/readiness, Wazuh, and log-position durability need work.
Network segmentation Medium Good design intent; stale docs and broad Tailscale ACLs weaken assurance.

Contractors should not be considered complete on this phase until the High and Critical findings in findings.md have a merged remediation PR, an updated runbook or ADR where operational truth changes, and a validation note showing either live proof or a documented reason validation is owner-blocked.

For runtime changes, require evidence from the host, not only repo diffs. For example, a port-hardening fix should include both Compose changes and a live listener/reachability check after Komodo deploy. A backup fix should include the manifest, timer status, a successful restic snapshot, and at least a scoped restore drill.