Findings Register¶
Severity reflects owner impact if the issue is left unresolved. Validation is
Confirmed when the repo and live checks both support the finding, Repo when
the evidence is static code/docs only, and Needs live check when runtime proof
is still needed.
| ID | Severity | Finding | Validation |
|---|---|---|---|
| CA-001 | Critical | Pull-request workflows execute repository code on persistent self-hosted runners. | Repo |
| CA-002 | Critical | Broad SOPS age recipients and host automation keys create a large secret blast radius. | Repo |
| CA-003 | High | Internal service ports are directly reachable outside the intended Authentik path. | Confirmed |
| CA-004 | High | Wazuh is stateful, unstable live, and not covered by backup/restore policy. | Confirmed |
| CA-005 | High | Secret-rendering scripts expose or persist plaintext secrets unsafely. | Repo |
| CA-006 | High | Source-of-truth docs contradict current implementation state. | Repo |
| CA-007 | High | SOPS CI checks can pass partially plaintext encrypted files. | Repo |
| CA-008 | High | ARA is permissive and directly published. | Confirmed |
| CA-009 | Medium-High | Tailscale server ACLs allow broad lateral movement and root SSH. | Repo |
| CA-010 | Medium-High | Public docs expose detailed operational and security information. | Repo |
| CA-011 | Medium | Backup and restore tests do not cover important external/tier-1 data. | Repo |
| CA-012 | Medium | Generator/check-mode behavior can miss missing outputs or use wrong connection vars. | Repo |
| CA-013 | Medium | Firewall and DNS docs contain stale or unresolved network claims. | Repo |
| CA-014 | Medium | Grafana grants Admin to every auth-proxy user. | Repo |
| CA-015 | Medium | Docker socket access is wider than necessary. | Repo |
| CA-016 | Medium | Komodo deployment trigger ignores Grafana dashboard changes. | Repo |
| CA-017 | Low-Medium | Several Compose stacks use floating latest images. |
Repo |
| CA-018 | Low-Medium | Promtail stores positions in /tmp. |
Repo |
Critical Findings¶
CA-001: PR Workflows Execute Code On Self-Hosted Runners¶
Evidence: The main lint workflow runs on [self-hosted, Linux, X64,
homelab-ci] for pull_request events, and the same pattern appears in secret,
dependency, and Semgrep workflows. See
lint.yml, especially lines 3-15, and
secrets-scan.yml, lines 7-27
and 41-43.
Impact: Malicious PR code, compromised contributor accounts, or dependency attacks can execute on infrastructure-adjacent runner hosts. Even without GitHub secrets, runner filesystem, network reachability, tool caches, and local credentials are part of the attack surface.
Recommendation: Move PR checks to GitHub-hosted or ephemeral isolated
runners. Reserve persistent homelab runners for trusted push,
workflow_dispatch, or deploy-only jobs. If self-hosted PR runners must remain,
use separate unprivileged hosts with no LAN reach, no repo secrets, no Docker
socket, clean workspaces, and strict first-time-contributor approval.
CA-002: SOPS Automation Key Has Broad Secret Scope¶
Evidence: .sops.yaml applies the same two age recipients across secrets,
service env files, and group var SOPS files. Managed-host docs state that
/etc/homelab/age-key.txt is readable by root:docker with mode 440. See
.sops.yaml, lines 8-25, and
secrets.md, lines 109-126.
Impact: Root compromise on any host with the automation age key, or compromise of an account/container with docker-group-equivalent access, can decrypt unrelated domains such as Cloudflare, Tailscale, backups, Komodo, Authentik outpost, and Saltbox credentials.
Recommendation: Split SOPS recipients by host, service, or trust domain. Stop distributing one global decrypt key to every managed host. Rotate the current automation key after segmentation and document the recovery process per recipient group.
High Findings¶
CA-003: Direct Host Ports Bypass The Intended Auth Layer¶
Evidence: Repo Compose publishes ARA 8000, Prometheus 9090, Traefik
metrics/dashboard 8080, cAdvisor 8081, and Loki 3100. See
ara/compose.yml, lines 7-13,
monitoring/compose.yml, lines
16-17 and 111-123, and
traefik/compose.yml, lines 9-12.
Live validation on infra-services showed listeners on 0.0.0.0/[::] for
those ports. Workstation TCP tests to 192.168.6.17 succeeded for 8000,
8080, 8081, 3100, and 9090.
Impact: These paths can bypass Traefik and Authentik. Docker-published ports
can also bypass UFW expectations unless DOCKER-USER rules are explicitly
managed.
Recommendation: Remove host publishes where Traefik or internal Docker
network access is sufficient. Bind machine-only APIs to 127.0.0.1, or bind to
192.168.6.17 plus explicit DOCKER-USER allowlists when LAN access is
required. Disable Prometheus lifecycle unless it is operationally required.
CA-004: Wazuh Is Not Operationally Complete¶
Evidence: Wazuh defines many stateful volumes in
wazuh/compose.yml, lines 17-27 and
55-70, but no services/wazuh/backup.yml exists in the repo or live service
directory. Live docker ps showed wazuh-dashboard restarting. The indexer
config disables disk thresholds in
wazuh.indexer.yml,
line 48. Demo internal users remain in
internal_users.yml,
lines 11-56.
Impact: Losing infra-services can lose SIEM history, manager state,
dashboard customizations, filebeat state, and enrollment continuity. Disabled
disk thresholds can let Wazuh consume the shared infra-services disk and impair
unrelated platform services.
Recommendation: Treat Wazuh as not done. Add a backup manifest and restore runbook, configure index retention and disk watermarks, remove or rotate demo users, narrow proxy trust, and require a live dashboard stability check before closing the work.
CA-005: Secret Scripts Handle Plaintext Unsafely¶
Evidence: render-komodo-compose-env.sh stores decrypted SOPS JSON in a
shell variable, passes it as a Python command-line argument, copies the template
to output before validation, and does not set restrictive permissions. See
render-komodo-compose-env.sh,
lines 29-56. update-backup-sops.sh writes decrypted backup data and private
key material to /tmp/backup-dec.yaml without mktemp, umask, or a cleanup
trap. See
update-backup-sops.sh, lines 9-24.
Impact: Secrets can appear in process listings, remain in predictable temp paths after interruption, or be left in output files with default permissions.
Recommendation: Pass decrypted values via stdin or 0600 temp files created
with mktemp. Use umask 077, cleanup traps, atomic replace, and output mode
0600. Validate all required secrets before replacing live env files.
CA-006: Source-Of-Truth Docs Contradict Current State¶
Evidence: PLAN.md declares itself the source of truth, but it still shows
early hotfixes and acceptance items as unchecked while README owner TODO rows and
the security register mark those areas closed. See PLAN.md,
lines 9-11 and 195-211, README.md, lines 80-132, and
security-register.md, lines 9-19.
Impact: Contractors can use the wrong file as authority, re-open completed work, or implement obsolete acceptance criteria.
Recommendation: Reclassify PLAN.md as historical plan, or add a live
status overlay that points to the current state ledger. Update the README
architecture table to show Authentik as live, not planned.
CA-007: SOPS Encryption Check Is Too Weak¶
Evidence: check-sops-encryption.py only verifies that a top-level sops
metadata key exists. It does not recursively verify that non-metadata scalar
values are encrypted. CI runs this script for *.sops.yaml. See
check-sops-encryption.py, lines
13-29, and lint.yml, lines 95-113.
Impact: A partially decrypted file with stale SOPS metadata can pass CI and pre-commit checks.
Recommendation: Recursively inspect all non-sops values and fail on
plaintext scalars, or use sops filestatus/sops --decrypt checks with the CI
age key where available.
CA-008: ARA Direct API Is Permissive¶
Evidence: ARA publishes 8000:8000, uses ARA_ALLOWED_HOSTS: ['*'], and
sets ARA_CORS_ORIGIN_ALLOW_ALL=true. See
ara/compose.yml, lines 7-13. Live checks
confirmed port 8000 is reachable from the operator workstation.
Impact: ARA can expose playbook history, hostnames, paths, and accidental task output outside the intended browser-auth path.
Recommendation: Bind direct ARA callback/API traffic to a restricted address or add network ACLs for managed hosts only. Set explicit allowed hosts and disable broad CORS unless a documented callback use case requires it.
Medium And Low Findings¶
CA-009: Tailscale ACLs Are Broad¶
infra/tailscale/acl.json allows tag:server to tag:server:* and admin
Tailscale SSH to root on all tag:server nodes. See
acl.json, lines 28-45.
Replace broad server-to-server reachability with explicit service ports and role-specific tags. Restrict root SSH to hosts that require it, such as Proxmox.
CA-010: Public Docs Are An Attack Map¶
Cloudflare Pages hardening docs record that Cloudflare Access was declined and
the public Pages site is accepted. See
cloudflare-pages.md, lines 64-70. The
published docs contain internal IPs, VLANs, hostnames, WAN ingress, runner
labels, SSH patterns, and recovery procedures.
Either place docs behind Cloudflare Access or split public/private docs so operational detail is not published.
CA-011: Backup Restore Coverage Is Incomplete¶
backups/policies.yaml has tier definitions and external critical backups, but
restore testing only centers on service manifests under /opt/homelab/services.
External tier-1 items such as Authentik DB, HAOS, and Harbor need automated or
documented restore-test evidence. Wazuh also needs a tier assignment and backup
manifest.
CA-012: Generator Checks And Connection Vars Need Hardening¶
render-discovery-inventory.py --check creates the output directory and only
detects drift when files already exist. It also hardcodes managed targets to
ansible_user: someone, while prox host vars require root. See
render-discovery-inventory.py,
lines 71-80 and 145-158, and
prox.yml, lines 1-4.
Make check mode fail on missing files and source connection vars from canonical Ansible inventory/host vars.
CA-013: Firewall And DNS Docs Drift¶
network.md still says PiHole remains during soak, while network-live.md says
PiHole was destroyed. firewall-policy.md says Management has unrestricted
access and IoT cannot reach AdGuard without an explicit exception, while
firewall-live.md documents a narrower live matrix. See
network.md, lines 73-84,
network-live.md, lines 6-9 and 169-194,
firewall-policy.md, lines 12-17,
174-178, and 231-237, and
firewall-live.md, lines 118-138.
Run a current UniFi scan, update the policy docs, and label old scan sections as historical snapshots.
CA-014: Grafana Auth Proxy Grants Admin Broadly¶
Grafana has auth proxy auto-signup enabled and assigns every auto-created user
the Admin role. See
monitoring/compose.yml, lines
69-78.
Default to Viewer or Editor and map Authentik groups to Grafana roles.
CA-015: Docker Socket Access Is Overused¶
Traefik, Homepage, Promtail, cAdvisor, and Komodo Periphery mount the Docker
socket. See
traefik/compose.yml, lines 13-16,
homepage/compose.yml, lines 7-9,
monitoring/compose.yml, lines
104-109 and 130-133, and
komodo/compose.yml, lines 51-58.
Add docker-socket-proxy with per-service allowlists and remove convenience
socket mounts where not essential.
CA-016: Grafana Dashboard Changes Do Not Trigger Komodo Deploy¶
The Komodo deploy workflow ignores monitoring/grafana/**, while the monitoring
README treats dashboards as repo-provisioned configuration. See
komodo-deploy.yml, lines
10-16.
Remove the ignore or add a sync/reload path for dashboard changes.
CA-017: Floating Images Reduce Reproducibility¶
ARA and Homepage use latest. See
ara/compose.yml, line 4, and
homepage/compose.yml, line 4.
Pin versions or digests and use Dependabot/Renovate-style update PRs.
CA-018: Promtail Positions Are Ephemeral¶
Promtail stores positions at /tmp/positions.yaml. See
promtail-config.yml, lines
5-9.
Move positions to a named volume, such as /var/lib/promtail/positions.yaml, to
avoid duplicate or missed log ingestion after restarts.