Findings Register¶

Severity reflects owner impact if the issue is left unresolved. Validation is Confirmed when the repo and live checks both support the finding, Repo when the evidence is static code/docs only, and Needs live check when runtime proof is still needed.

ID	Severity	Finding	Validation
CA-001	Critical	Pull-request workflows execute repository code on persistent self-hosted runners.	Repo
CA-002	Critical	Broad SOPS age recipients and host automation keys create a large secret blast radius.	Repo
CA-003	High	Internal service ports are directly reachable outside the intended Authentik path.	Confirmed
CA-004	High	Wazuh is stateful, unstable live, and not covered by backup/restore policy.	Confirmed
CA-005	High	Secret-rendering scripts expose or persist plaintext secrets unsafely.	Repo
CA-006	High	Source-of-truth docs contradict current implementation state.	Repo
CA-007	High	SOPS CI checks can pass partially plaintext encrypted files.	Repo
CA-008	High	ARA is permissive and directly published.	Confirmed
CA-009	Medium-High	Tailscale server ACLs allow broad lateral movement and root SSH.	Repo
CA-010	Medium-High	Public docs expose detailed operational and security information.	Repo
CA-011	Medium	Backup and restore tests do not cover important external/tier-1 data.	Repo
CA-012	Medium	Generator/check-mode behavior can miss missing outputs or use wrong connection vars.	Repo
CA-013	Medium	Firewall and DNS docs contain stale or unresolved network claims.	Repo
CA-014	Medium	Grafana grants Admin to every auth-proxy user.	Repo
CA-015	Medium	Docker socket access is wider than necessary.	Repo
CA-016	Medium	Komodo deployment trigger ignores Grafana dashboard changes.	Repo
CA-017	Low-Medium	Several Compose stacks use floating `latest` images.	Repo
CA-018	Low-Medium	Promtail stores positions in `/tmp`.	Repo

Critical Findings¶

CA-001: PR Workflows Execute Code On Self-Hosted Runners¶

Evidence: The main lint workflow runs on [self-hosted, Linux, X64, homelab-ci] for pull_request events, and the same pattern appears in secret, dependency, and Semgrep workflows. See lint.yml, especially lines 3-15, and secrets-scan.yml, lines 7-27 and 41-43.

Impact: Malicious PR code, compromised contributor accounts, or dependency attacks can execute on infrastructure-adjacent runner hosts. Even without GitHub secrets, runner filesystem, network reachability, tool caches, and local credentials are part of the attack surface.

Recommendation: Move PR checks to GitHub-hosted or ephemeral isolated runners. Reserve persistent homelab runners for trusted push, workflow_dispatch, or deploy-only jobs. If self-hosted PR runners must remain, use separate unprivileged hosts with no LAN reach, no repo secrets, no Docker socket, clean workspaces, and strict first-time-contributor approval.

CA-002: SOPS Automation Key Has Broad Secret Scope¶

Evidence: .sops.yaml applies the same two age recipients across secrets, service env files, and group var SOPS files. Managed-host docs state that /etc/homelab/age-key.txt is readable by root:docker with mode 440. See .sops.yaml, lines 8-25, and secrets.md, lines 109-126.

Impact: Root compromise on any host with the automation age key, or compromise of an account/container with docker-group-equivalent access, can decrypt unrelated domains such as Cloudflare, Tailscale, backups, Komodo, Authentik outpost, and Saltbox credentials.

Recommendation: Split SOPS recipients by host, service, or trust domain. Stop distributing one global decrypt key to every managed host. Rotate the current automation key after segmentation and document the recovery process per recipient group.

High Findings¶

CA-003: Direct Host Ports Bypass The Intended Auth Layer¶

Evidence: Repo Compose publishes ARA 8000, Prometheus 9090, Traefik metrics/dashboard 8080, cAdvisor 8081, and Loki 3100. See ara/compose.yml, lines 7-13, monitoring/compose.yml, lines 16-17 and 111-123, and traefik/compose.yml, lines 9-12.

Live validation on infra-services showed listeners on 0.0.0.0/[::] for those ports. Workstation TCP tests to 192.168.6.17 succeeded for 8000, 8080, 8081, 3100, and 9090.

Impact: These paths can bypass Traefik and Authentik. Docker-published ports can also bypass UFW expectations unless DOCKER-USER rules are explicitly managed.

Recommendation: Remove host publishes where Traefik or internal Docker network access is sufficient. Bind machine-only APIs to 127.0.0.1, or bind to 192.168.6.17 plus explicit DOCKER-USER allowlists when LAN access is required. Disable Prometheus lifecycle unless it is operationally required.

CA-004: Wazuh Is Not Operationally Complete¶

Evidence: Wazuh defines many stateful volumes in wazuh/compose.yml, lines 17-27 and 55-70, but no services/wazuh/backup.yml exists in the repo or live service directory. Live docker ps showed wazuh-dashboard restarting. The indexer config disables disk thresholds in wazuh.indexer.yml, line 48. Demo internal users remain in internal_users.yml, lines 11-56.

Impact: Losing infra-services can lose SIEM history, manager state, dashboard customizations, filebeat state, and enrollment continuity. Disabled disk thresholds can let Wazuh consume the shared infra-services disk and impair unrelated platform services.

Recommendation: Treat Wazuh as not done. Add a backup manifest and restore runbook, configure index retention and disk watermarks, remove or rotate demo users, narrow proxy trust, and require a live dashboard stability check before closing the work.

CA-005: Secret Scripts Handle Plaintext Unsafely¶

Evidence: render-komodo-compose-env.sh stores decrypted SOPS JSON in a shell variable, passes it as a Python command-line argument, copies the template to output before validation, and does not set restrictive permissions. See render-komodo-compose-env.sh, lines 29-56. update-backup-sops.sh writes decrypted backup data and private key material to /tmp/backup-dec.yaml without mktemp, umask, or a cleanup trap. See update-backup-sops.sh, lines 9-24.

Impact: Secrets can appear in process listings, remain in predictable temp paths after interruption, or be left in output files with default permissions.

Recommendation: Pass decrypted values via stdin or 0600 temp files created with mktemp. Use umask 077, cleanup traps, atomic replace, and output mode 0600. Validate all required secrets before replacing live env files.

CA-006: Source-Of-Truth Docs Contradict Current State¶

Evidence: PLAN.md declares itself the source of truth, but it still shows early hotfixes and acceptance items as unchecked while README owner TODO rows and the security register mark those areas closed. See PLAN.md, lines 9-11 and 195-211, README.md, lines 80-132, and security-register.md, lines 9-19.

Impact: Contractors can use the wrong file as authority, re-open completed work, or implement obsolete acceptance criteria.

Recommendation: Reclassify PLAN.md as historical plan, or add a live status overlay that points to the current state ledger. Update the README architecture table to show Authentik as live, not planned.

CA-007: SOPS Encryption Check Is Too Weak¶

Evidence: check-sops-encryption.py only verifies that a top-level sops metadata key exists. It does not recursively verify that non-metadata scalar values are encrypted. CI runs this script for *.sops.yaml. See check-sops-encryption.py, lines 13-29, and lint.yml, lines 95-113.

Impact: A partially decrypted file with stale SOPS metadata can pass CI and pre-commit checks.

Recommendation: Recursively inspect all non-sops values and fail on plaintext scalars, or use sops filestatus/sops --decrypt checks with the CI age key where available.

CA-008: ARA Direct API Is Permissive¶

Evidence: ARA publishes 8000:8000, uses ARA_ALLOWED_HOSTS: ['*'], and sets ARA_CORS_ORIGIN_ALLOW_ALL=true. See ara/compose.yml, lines 7-13. Live checks confirmed port 8000 is reachable from the operator workstation.

Impact: ARA can expose playbook history, hostnames, paths, and accidental task output outside the intended browser-auth path.

Recommendation: Bind direct ARA callback/API traffic to a restricted address or add network ACLs for managed hosts only. Set explicit allowed hosts and disable broad CORS unless a documented callback use case requires it.

Medium And Low Findings¶

CA-009: Tailscale ACLs Are Broad¶

infra/tailscale/acl.json allows tag:server to tag:server:* and admin Tailscale SSH to root on all tag:server nodes. See acl.json, lines 28-45.

Replace broad server-to-server reachability with explicit service ports and role-specific tags. Restrict root SSH to hosts that require it, such as Proxmox.

CA-010: Public Docs Are An Attack Map¶

Cloudflare Pages hardening docs record that Cloudflare Access was declined and the public Pages site is accepted. See cloudflare-pages.md, lines 64-70. The published docs contain internal IPs, VLANs, hostnames, WAN ingress, runner labels, SSH patterns, and recovery procedures.

Either place docs behind Cloudflare Access or split public/private docs so operational detail is not published.

CA-011: Backup Restore Coverage Is Incomplete¶

backups/policies.yaml has tier definitions and external critical backups, but restore testing only centers on service manifests under /opt/homelab/services. External tier-1 items such as Authentik DB, HAOS, and Harbor need automated or documented restore-test evidence. Wazuh also needs a tier assignment and backup manifest.

CA-012: Generator Checks And Connection Vars Need Hardening¶

render-discovery-inventory.py --check creates the output directory and only detects drift when files already exist. It also hardcodes managed targets to ansible_user: someone, while prox host vars require root. See render-discovery-inventory.py, lines 71-80 and 145-158, and prox.yml, lines 1-4.

Make check mode fail on missing files and source connection vars from canonical Ansible inventory/host vars.

CA-013: Firewall And DNS Docs Drift¶

network.md still says PiHole remains during soak, while network-live.md says PiHole was destroyed. firewall-policy.md says Management has unrestricted access and IoT cannot reach AdGuard without an explicit exception, while firewall-live.md documents a narrower live matrix. See network.md, lines 73-84, network-live.md, lines 6-9 and 169-194, firewall-policy.md, lines 12-17, 174-178, and 231-237, and firewall-live.md, lines 118-138.

Run a current UniFi scan, update the policy docs, and label old scan sections as historical snapshots.

CA-014: Grafana Auth Proxy Grants Admin Broadly¶

Grafana has auth proxy auto-signup enabled and assigns every auto-created user the Admin role. See monitoring/compose.yml, lines 69-78.

Default to Viewer or Editor and map Authentik groups to Grafana roles.

CA-015: Docker Socket Access Is Overused¶

Traefik, Homepage, Promtail, cAdvisor, and Komodo Periphery mount the Docker socket. See traefik/compose.yml, lines 13-16, homepage/compose.yml, lines 7-9, monitoring/compose.yml, lines 104-109 and 130-133, and komodo/compose.yml, lines 51-58.

Add docker-socket-proxy with per-service allowlists and remove convenience socket mounts where not essential.

CA-016: Grafana Dashboard Changes Do Not Trigger Komodo Deploy¶

The Komodo deploy workflow ignores monitoring/grafana/**, while the monitoring README treats dashboards as repo-provisioned configuration. See komodo-deploy.yml, lines 10-16.

Remove the ignore or add a sync/reload path for dashboard changes.

CA-017: Floating Images Reduce Reproducibility¶

ARA and Homepage use latest. See ara/compose.yml, line 4, and homepage/compose.yml, line 4.

Pin versions or digests and use Dependabot/Renovate-style update PRs.

CA-018: Promtail Positions Are Ephemeral¶

Promtail stores positions at /tmp/positions.yaml. See promtail-config.yml, lines 5-9.

Move positions to a named volume, such as /var/lib/promtail/positions.yaml, to avoid duplicate or missed log ingestion after restarts.