Skip to content

Phase 7R Network & Infrastructure Audit — Owner Questionnaire

Audit date: 2026-06-18 (infra-services SSH verified same day) Auditor context: Takeover-style review; documentation may be stale. Live data pulled where API access exists; gaps are called out explicitly. Live scan: UDM SE UniFi Network 10.5.43 (gateway uptime ~31 hours at capture). Branch: docs/phase-7r-network-audit (working notes; raw JSON in gitignored .scratch/audit/).

Answer inline ([ ] / free text) or reply in chat keyed by question ID (e.g. Q4.2). Sections are ordered: access gaps → network/ZBF → Home Assistant → compute → DNS → security → housekeeping.


0. What we could and could not see

Source Status Notes
UniFi API (labctl unifi dump) OK 46 clients, 6 devices, 119 ZBF policies, 8 zones
Proxmox API (labctl proxmox) OK 6 QEMU + 16 LXC; configs for key guests
Synology DSM API (labctl synology) OK System info, network, packages, shares
Inventory + network-scan.py OK 31 unknown clients; 10 inventory IPs not in stat/sta
SSH infra-services (WSL) OK Batch SSH via infra-services alias; 10 containers observed
SSH proxbox (WSL, 1Password) OK Host alias proxbox192.168.6.71; shell + pvesh verified 2026-06-18
SSH saltierpoop Not attempted Same key gap expected
Home Assistant API OK Long-lived token in .env; pulled 2026-06-18 (see §0.3)
Tailscale from this workstation Not attempted No tailnet session in Cursor shell
graph TB
    subgraph observed["Observed live"]
        UDM["UDM SE · ZBF"]
        PVE["Proxmox prox · 8 running guests"]
        DSM["Synology whrrr · 3 LAN IPs"]
        INFRA["infra-services · 6 compose stacks"]
        PROX["proxbox · 8 running guests"]
        HA["Home Assistant · 54 integrations"]
        INV["Inventory YAML + docs"]
    end
    subgraph blind["Could not observe"]
        SALT["saltierpoop containers"]
    end
    observed --> blind

0.1 infra-services — compose stacks (observed 2026-06-18)

SSH: wslssh -o BatchMode=yes infra-services (owner fixed WSL access).

Running containers (10)

Stack Container Image Notes
traefik traefik traefik:v3.6 :80, :443, :8080 on host
homepage homepage gethomepage/homepage :3000 on host (healthy)
ara ara recordsansible/ara-api :8000 on host
komodo komodo-core, komodo-periphery, komodo-mongo moghtech/komodo-* + mongo:7.0 internal ports only
monitoring prometheus, grafana, loki, promtail, alertmanager prom/grafana stack prometheus :9090, loki :3100

Present on disk but not running

Stack Path Notes
adguard /opt/homelab/services/adguard/ compose.yml, dns-rewrites.yaml, unbound.conf synced; no compose.env / .env; docker compose ps empty
backup-client No /opt/homelab/services/backup-client/ directory on host

Host DNS on :53

Only systemd-resolved on 127.0.0.53 / 127.0.0.54not AdGuard. PiHole at 192.168.6.80 remains the network DNS path via UDM WAN settings.

Ansible pull

ansible-pull-apply.timer and ansible-pull-check.timer are active (not the generic ansible-pull.timer). Host checkout on main at 2e073fd with local untracked/modified files (backups/restore-test.sh, services/monitoring/.env.sops.yaml, homepage config drift).

Docker networks

traefik (external), monitoring_default, komodo_default, plus default bridge/host.


0.2 proxbox — Proxmox guests (observed 2026-06-18)

SSH: wslssh proxbox (owner SSH config alias; 1Password agent for auth).

Hostname on host: prox. Matches Proxmox API inventory.

Running (8)

ID Type Name RAM (max)
100 qemu saltierpoop 30720 MB
123 qemu infra-services 8192 MB
200 qemu haos 6144 MB
104 lxc blocktopus 2048 MB
105 lxc k6-loadtest 6144 MB
111 lxc influxdb 2048 MB
116 lxc pulse 1024 MB
119 lxc harbor-registry 4096 MB

Stopped (14)

aiProject, unmanic, metrimon, dnsproject, graylog, sqlserver2022, mysql, nfs-monitoring, ollama, caddy, reactive-resume, octoprint, penpot, netboot.xyz

Note: 1Password SSH works interactively; unattended agents may need a dedicated key later (same pattern as infra-services-cursor).


0.3 Home Assistant — integration health (2026-06-18)

Source: REST API via .env token (HA_URL / HA_TOKEN). Raw summary in .scratch/ha-audit.json (gitignored).

Metric Value
HA version 2026.6.3
Config entries 54
Entities (live states) ~800+
Unavailable entities 394

Integrations not healthy (20 of 54)

Integration State Error / target Likely cause (ZBF / VLAN)
smlight SLZB-06M setup_retry Connection failed Servers → IoT blocked (coordinator at 192.168.7.132) → Q3.1
homekit_controller Aqara-Hub-M2-7E74 setup_in_progress Hub on GenPop 192.168.1.82, not IoT → Q2.1
homekit_controller Doorbell Repeater setup_retry timeout 192.168.7.107:43507 Servers → IoT blockedQ3.1
mqtt loaded zigbee2mqtt entities exist Bridge likely dead while SLZB unreachable
tplink ×3 (EP10 plugs) setup_retry timeout 192.168.1.248/107/39:9999 Plugs on GenPop; Servers → GenPop blocked → new rule or move plugs to IoT/Appliances
octoprint setup_retry 192.168.6.222:5000 connect failed LXC 120 stoppedQ5.1
otbr OpenThread Border Router setup_retry Unable to connect Thread/Matter; may need IoT reachability
unifiprotect UDM SE setup_error Authentication failed Re-auth in HA UI (not ZBF)
synology_dsm ×2 not_loaded stale 192.168.1.88 / .105 Remove or fix IPs (NAS is 192.168.6.215)
ipp / syncthru Samsung printer setup_retry IPP timeout Printer on GenPop .167 → VLAN + reachability
tuya_local feeder migration_error Integration migration (not ZBF)
tuya cloud not_loaded Disabled / unused?
upnp UDM not_loaded Optional discovery
apple_tv Bedroom setup_in_progress May need Personal → IoTQ3.2
srp_energy setup_retry Utility API (unrelated)
flowchart LR
    HA["HA 192.168.6.227<br/>Servers VLAN"]
    SLZB["SLZB .132<br/>IoT"]
    Aqara["Aqara hub .82<br/>GenPop"]
    TPL["TP-Link .1.x<br/>GenPop"]

    HA x--x|blocked| SLZB
    HA x--x|blocked| Aqara
    HA x--x|blocked| TPL

Takeaway: A large share of HA pain maps directly to missing east-west firewall allows (Servers→IoT, Servers→GenPop) and wrong VLAN placement (Aqara, printer, TP-Link). Fixing Q3.1 + Q2.1 should move the needle before chasing individual integrations.


1. Access — resolved

All audit SSH/API paths are working except saltierpoop (not attempted).

~~Q1.1 — SSH to infra-services~~ ✅ Resolved

2026-06-18: Owner fixed WSL SSH to infra-services. Agent verified batch SSH and enumerated compose stacks — see §0.1 above.

~~Q1.2 — SSH to Proxmox (proxbox)~~ ✅ Resolved

2026-06-18: Owner confirmed SSH config alias proxbox (1Password auth). Agent verified ssh proxbox → hostname prox, full guest list in §0.2.


~~Q1.3 — Home Assistant long-lived access token~~ ✅ Resolved

2026-06-18: Token stored in local .env. API pull complete — see §0.3.

Token creation steps (reference) 1. Open `http://192.168.6.227:8123` → profile (username, bottom-left sidebar). 2. **Security → Long-Lived Access Tokens → Create Token** (name e.g. `homelab-audit-cursor`). 3. Add `HA_URL` + `HA_TOKEN` to repo-root `.env` (see `.env.example`).

2. Network topology & VLAN placement

Live client counts (UniFi stat/sta)

VLAN / network Clients Notes
Servers (4) 14 prox, infra-services, saltierpoop, HA, PiHole, NAS NICs, harbor, influx, pulse, k6, VMs on NAS
IoT (5) 14 SLZB coordinator now has IP .132 (was missing in June scan)
Personal (2) 7 Includes Fiio R7 .44
Appliances (3) 5 Fellow Aiden on Appliances not IoT
Security (6) 3 Rack cam .11 live (inventory says .10)
GenPop (1) 3 See below — design says guests-only

Q2.1 — GenPop has non-guest devices

Answers (2026-06-18):

IP Device Decision
192.168.1.167 Samsung printer Stay GenPop — print from Personal; GenPop → Servers already allowed
192.168.1.82 Aqara Hub M2 Move to IoT WiFi
192.168.1.218 OnePlus 8 Pro Personal VLAN 2 — on IsThisTheKrustyKrab (owner confirmed)

Printer note: CaptainKangapoo at 192.168.3.17 is Personal VLAN 2 (192.168.3.0/24), not GenPop. Policy 3 targets that network.

2026-06-18: Allow rule exists (Action=Allow, 527k+ hits) but sits below Block inter-VLAN — must reorder above block. See troubleshooting.

  • [x] Printer stays GenPop; Personal must reach .167 for printing
  • [x] Aqara → IoT WiFi
  • [x] OnePlus back on Personal SSID — confirmed

Follow-up (owner):

  • [ ] Move Aqara Hub M2 to IoT WiFi (in progress)
  • [ ] Move TP-Link plugs when not in active use → README Owner TODO

Q2.2 — WiFi SSID leakage

Observed: Several Personal and IoT devices still associate via The LAN Before Time (GenPop SSID) per client metadata in earlier scans; today many show correct network names but GenPop still has 3 clients.

  • [ ] Should The LAN Before Time be disabled except when guests visit?
  • [ ] Should UniFi client isolation or minimum RSSI be used to force trusted devices onto EAP SSIDs?

Q2.3 — Security camera IP drift

Device Inventory Live (2026-06-18)
Rack cam (G5 Flex) 192.168.8.10 192.168.8.11
  • [ ] Update inventory to .11, or re-reserve .10 on the camera?

3. Zone firewall (ZBF) — intent vs reality

Live user-defined policies (unchanged since design)

Policy Effect
GenPop → Servers ALLOW
Personal → Servers ALLOW
Personal → Appliances ALLOW
Management → Internal ALLOW
Block inter-VLAN (Internal) BLOCK new (except above)
IoT/Security → Gateway mgmt ports BLOCK 22/80/443

Not present: Servers → IoT, Personal → IoT, GenPop → IoT, any → IoT VLAN 5 except Personal→Appliances zone.

graph LR
    HA["HA 192.168.6.227<br/>Servers VLAN"]
    ZB["SLZB .132<br/>IoT VLAN"]
    PH["Phone<br/>Personal VLAN"]
    HP["HomePod .124<br/>IoT VLAN"]

    HA -.->|"BLOCKED"| ZB
    PH -.->|"mDNS only"| HP

Q3.1 — Home Assistant → IoT (critical for your symptoms)

Answer (2026-06-18): B — Narrow allow: 192.168.6.227 → IoT zone only. → Policy 1 in remediation runbook

Observed: HAOS VM 200 on Servers VLAN (192.168.6.227). Zigbee coordinator (SLZB-06M) on IoT (192.168.7.132). No firewall allow for Servers → IoT.

HA evidence (§0.3): smlight SLZB-06M Connection failed; HomeKit Doorbell Repeater timeout to 192.168.7.107; 394 unavailable entities.

  • [x] B. Narrow allow: source 192.168.6.227 only → IoT zone (Appliances + IoT VLANs)

Answer (2026-06-18): B — Move smart plugs to IoT or Appliances WiFi and re-pair. → Owner actions in remediation runbook

  • [x] B. Move smart plugs to IoT or Appliances WiFi and re-pair?

Q3.2 — Personal → IoT (AirPlay / HomeKit / phones)

Answer (2026-06-18): Yes — AirPlay must work. → Policy 2 in remediation runbook

  • [x] Add Personal → IoT (VLAN 5) allow

Q3.3 — Personal → IoT VLAN 5 (not just Appliances)

Answer (2026-06-18): Not intentional — the Appliances-only allow was an accidental side effect of zone layout, not a deliberate “block smart speakers” rule. Covered by Q3.2 (add Personal → IoT VLAN 5).

Plain English: You have two “smart device” WiFis. The firewall already let your phone talk to Appliances WiFi (VLAN 3) but blocked IoT WiFi (VLAN 5) where HomePod and Apple TV live. mDNS made speakers visible; streaming was blocked. → Full explanation: phase-7r-zbf-remediation.md

  • [x] Not intentional — allow Personal → IoT VLAN 5 (same as Q3.2)

Q3.4 — IoT → DNS on Servers (future AdGuard)

Answer (2026-06-18): Yes — allow IoT → 192.168.6.17:53 only when AdGuard is deployed (defer until DNS cutover). → Deferred policy in remediation runbook

  • [x] When cutting over DNS, OK to add IoT → 192.168.6.17:53 only

Q3.5 — Fellow Aiden on Appliances VLAN

Observed: Coffee brewer at 192.168.5.39 (Appliances). Older mapping expected IoT.

  • [ ] Correct VLAN: Appliances or IoT?

4. DNS & Phase 7 checklist ordering

Q4.1 — AdGuard deploy status

Observed (SSH 2026-06-18):

  • /opt/homelab/services/adguard/ exists with compose.yml, dns-rewrites.yaml, unbound.conf.
  • No compose.env or .env on host — stack never bootstrapped.
  • docker compose ps for adguard: empty (not running).
  • Host :53 is systemd-resolved only; no AdGuard listener on 192.168.6.17:53.
  • Homepage already binds host :3000 (AdGuard initial-setup docs also mention :3000).

Confirmed: AdGuard is not deployed. PiHole LXC 104 (blocktopus, .80) is still the live DNS filter; UDM WAN DNS points there.

  • [ ] Proceed with AdGuard deploy per Phase 7 runbook after ZBF remediation (Q3.4)?
  • [ ] For first-time setup: use Traefik route only (adguard.infra.realemail.app) and avoid binding AdGuard admin to host :3000 (conflicts with homepage)?

Q4.2 — PiHole still WAN upstream

Observed: wan_dns1 = 192.168.6.80 (Blocktopus). Entire house depends on LXC 104 PiHole for resolution.

  • [ ] Is this intentional interim, or oversight since migration?
  • [ ] OK to proceed with AdGuard only after Q3.4 answered?

5. Compute: Proxmox guests

Running (live)

ID Name Role (inferred) IP (UniFi)
100 saltierpoop Saltbox media VM .243
123 infra-services Docker/Komodo stack .17
200 haos Home Assistant .227
104 blocktopus PiHole LXC .80
105 k6-loadtest Load testing .223
111 influxdb Metrics TSDB .132
116 pulse Proxmox pulse? .199
119 harbor-registry Container registry .119

Stopped but still allocated (selected)

ID Name Inventory IP onboot Lab audit disposition
120 octoprint .222 0 KEEP? (Prusa on Appliances WiFi)
114 nfs-monitoring .107 0 MIGRATE per nfs-monitoring doc
117 caddy .244 DECOMMISSION?
109 graylog .197 DECOMMISSION
102–122 various many null mostly 0 CONSOLIDATE/DECOM per lab-audit

Q5.1 — OctoPrint LXC stopped

Observed: LXC 120 stopped, onboot=0. Prusa (192.168.5.59) on Appliances WiFi.

  • [ ] Should OctoPrint be running? If yes: start LXC 120 and confirm .222 lease?
  • [ ] Or printer controlled another way now?

Q5.2 — nfs-monitoring LXC stopped

Observed: LXC 114 stopped, onboot=0, inventory .107, not on network.

  • [ ] Still needed for NFS monitoring, or safe to decommission per consolidation plan?

Q5.3 — k6-loadtest always on

Observed: LXC 105 running, on network at .223, onboot=1.

  • [ ] Intentional 24/7, or should it be stopped when not testing?

Q5.4 — harbor-registry

Observed: LXC 119 running at .119; in inventory; not in June device map.

  • [ ] What uses Harbor today? Keep / decom?

Q5.5 — saltierpoop memory pressure

Observed: VM 100 allocated 30720 MB, using ~30 GB of 30 GB (nearly maxed). Inventory says ballooning: true; live QEMU config has balloon: 0.

  • [ ] Is performance acceptable? Plan to reduce RAM or enable ballooning?

Q5.6 — Stopped guest backlog

Observed: 13+ stopped guests per lab-audit; still present on disk.

  • [ ] Confirm destroy list from lab-audit.md §2 or provide updated keep/migrate/decom for: graylog, caddy, metrimon, dnsproject, penpot, reactive-resume, netboot.xyz, unmanic, ollama, aiproject, mysql, sqlserver2022.

6. Synology (whrrr)

Q6.1 — NAS DNS

Observed (DSM API): NAS uses DNS 192.168.6.1 (gateway), not PiHole directly.

  • [ ] Is NAS DNS behavior intentional?

Q6.2 — Customer VMs (Ubuncap / Recordurbate)

Observed: 192.168.6.100 (Ubuncap), 192.168.6.98 (Recordurbate) on Servers VLAN. Inventory: customer-app, hosted on whrrr via VMM.

  • [ ] Both VMs still active and correct?
  • [ ] Any firewall rules needed between them and homelab services?

Q6.3 — Multi-homed NICs

Observed: LAN1 .215, LAN2 .214, LAN3 .216 on Servers VLAN; ovs_eth3/4 link-local only.

  • [ ] Still plan per-VLAN IPs on separate NICs (deferred in lab-audit), or keep all on Servers VLAN?

7. Security & exposure

Q7.1 — WAN port forwards (unchanged)

Observed: TCP/UDP 80, 8080, 443 → 192.168.6.243 (saltierpoop).

  • [ ] Still required for all three ports?
  • [ ] Traefik on infra-services vs saltierpoop ingress — still dual-stack by design?

Q7.2 — SEC-001 SMB forward

Security register: SEC-001 still Open (public SMB to NAS).

  • [ ] Confirm removed on UDM, or still present? (API port_forwards did not show SMB.)

8. Inventory & documentation hygiene

Q8.1 — network-scan.py “unknown” clients

Observed: 31 clients not in inventory — mostly personal phones, IoT gadgets, appliances (expected). Inventory is not meant to list every consumer device.

  • [ ] Should we extend inventory with an iot-devices.yaml / endpoints.yaml for smart home gear, or keep inventory infra-only?

Q8.2 — Missing from UniFi stat/sta but in inventory

Includes: octoprint (stopped), nfs-monitoring (stopped), caddy, graylog, sqlserver2022, plus UniFi gear (often not in stat/sta as clients).

  • [ ] Any of these should be online but aren't?

Q8.3 — infra-services host drift vs repo

Observed (SSH 2026-06-18):

  • No services/backup-client/ on host (Phase 6 TODO still open).
  • Host /opt/homelab at 2e073fd with local modifications/untracked files (backups/restore-test.sh, services/monitoring/.env.sops.yaml, homepage config).

  • [ ] OK to reconcile host checkout (pull + ansible-pull) before AdGuard deploy?

  • [ ] Priority for backup-client deploy relative to ZBF remediation?

9. Path decision (your call after answering)

We are not asking you to pick TODO-sweep vs Phase 7 blindly. Based on this audit, the likely sequence is:

  1. Apply UDM policies 1–4phase-7r-zbf-remediation.md
  2. Move Aqara hub to IoT WiFi (Q2.1)
  3. Move TP-Link plugs (Q3.1b) and re-pair in HA
  4. Then AdGuard deploy + DNS cutover (Q4.x) + IoT→AdGuard:53 (Q3.4)

  5. [ ] Agree with remediation-first sequence? If not, what would you prioritize instead?


10. Freeform

Anything else broken, recently changed, or “works but shouldn’t” that we didn’t ask:

(Your notes here)

After you return answers, next deliverables: updated network-live.md, ZBF remediation runbook, inventory fixes, and a short journal entry.