Phase 7R Network & Infrastructure Audit — Owner Questionnaire¶
Audit date: 2026-06-18 (infra-services SSH verified same day)
Auditor context: Takeover-style review; documentation may be stale. Live data
pulled where API access exists; gaps are called out explicitly.
Live scan: UDM SE UniFi Network 10.5.43 (gateway uptime ~31 hours at capture).
Branch: docs/phase-7r-network-audit (working notes; raw JSON in gitignored .scratch/audit/).
Answer inline ([ ] / free text) or reply in chat keyed by question ID (e.g. Q4.2).
Sections are ordered: access gaps → network/ZBF → Home Assistant → compute → DNS → security → housekeeping.
0. What we could and could not see¶
| Source | Status | Notes |
|---|---|---|
UniFi API (labctl unifi dump) |
OK | 46 clients, 6 devices, 119 ZBF policies, 8 zones |
Proxmox API (labctl proxmox) |
OK | 6 QEMU + 16 LXC; configs for key guests |
Synology DSM API (labctl synology) |
OK | System info, network, packages, shares |
Inventory + network-scan.py |
OK | 31 unknown clients; 10 inventory IPs not in stat/sta |
SSH infra-services (WSL) |
OK | Batch SSH via infra-services alias; 10 containers observed |
SSH proxbox (WSL, 1Password) |
OK | Host alias proxbox → 192.168.6.71; shell + pvesh verified 2026-06-18 |
SSH saltierpoop |
Not attempted | Same key gap expected |
| Home Assistant API | OK | Long-lived token in .env; pulled 2026-06-18 (see §0.3) |
| Tailscale from this workstation | Not attempted | No tailnet session in Cursor shell |
graph TB
subgraph observed["Observed live"]
UDM["UDM SE · ZBF"]
PVE["Proxmox prox · 8 running guests"]
DSM["Synology whrrr · 3 LAN IPs"]
INFRA["infra-services · 6 compose stacks"]
PROX["proxbox · 8 running guests"]
HA["Home Assistant · 54 integrations"]
INV["Inventory YAML + docs"]
end
subgraph blind["Could not observe"]
SALT["saltierpoop containers"]
end
observed --> blind
0.1 infra-services — compose stacks (observed 2026-06-18)¶
SSH: wsl → ssh -o BatchMode=yes infra-services (owner fixed WSL access).
Running containers (10)¶
| Stack | Container | Image | Notes |
|---|---|---|---|
| traefik | traefik | traefik:v3.6 | :80, :443, :8080 on host |
| homepage | homepage | gethomepage/homepage | :3000 on host (healthy) |
| ara | ara | recordsansible/ara-api | :8000 on host |
| komodo | komodo-core, komodo-periphery, komodo-mongo | moghtech/komodo-* + mongo:7.0 | internal ports only |
| monitoring | prometheus, grafana, loki, promtail, alertmanager | prom/grafana stack | prometheus :9090, loki :3100 |
Present on disk but not running¶
| Stack | Path | Notes |
|---|---|---|
| adguard | /opt/homelab/services/adguard/ |
compose.yml, dns-rewrites.yaml, unbound.conf synced; no compose.env / .env; docker compose ps empty |
| backup-client | — | No /opt/homelab/services/backup-client/ directory on host |
Host DNS on :53¶
Only systemd-resolved on 127.0.0.53 / 127.0.0.54 — not AdGuard. PiHole
at 192.168.6.80 remains the network DNS path via UDM WAN settings.
Ansible pull¶
ansible-pull-apply.timer and ansible-pull-check.timer are active (not the
generic ansible-pull.timer). Host checkout on main at 2e073fd with local
untracked/modified files (backups/restore-test.sh, services/monitoring/.env.sops.yaml,
homepage config drift).
Docker networks¶
traefik (external), monitoring_default, komodo_default, plus default bridge/host.
0.2 proxbox — Proxmox guests (observed 2026-06-18)¶
SSH: wsl → ssh proxbox (owner SSH config alias; 1Password agent for auth).
Hostname on host: prox. Matches Proxmox API inventory.
Running (8)¶
| ID | Type | Name | RAM (max) |
|---|---|---|---|
| 100 | qemu | saltierpoop | 30720 MB |
| 123 | qemu | infra-services | 8192 MB |
| 200 | qemu | haos | 6144 MB |
| 104 | lxc | blocktopus | 2048 MB |
| 105 | lxc | k6-loadtest | 6144 MB |
| 111 | lxc | influxdb | 2048 MB |
| 116 | lxc | pulse | 1024 MB |
| 119 | lxc | harbor-registry | 4096 MB |
Stopped (14)¶
aiProject, unmanic, metrimon, dnsproject, graylog, sqlserver2022, mysql,
nfs-monitoring, ollama, caddy, reactive-resume, octoprint, penpot, netboot.xyz
Note: 1Password SSH works interactively; unattended agents may need a dedicated
key later (same pattern as infra-services-cursor).
0.3 Home Assistant — integration health (2026-06-18)¶
Source: REST API via .env token (HA_URL / HA_TOKEN). Raw summary in
.scratch/ha-audit.json (gitignored).
| Metric | Value |
|---|---|
| HA version | 2026.6.3 |
| Config entries | 54 |
| Entities (live states) | ~800+ |
| Unavailable entities | 394 |
Integrations not healthy (20 of 54)¶
| Integration | State | Error / target | Likely cause (ZBF / VLAN) |
|---|---|---|---|
| smlight SLZB-06M | setup_retry |
Connection failed | Servers → IoT blocked (coordinator at 192.168.7.132) → Q3.1 |
| homekit_controller Aqara-Hub-M2-7E74 | setup_in_progress |
— | Hub on GenPop 192.168.1.82, not IoT → Q2.1 |
| homekit_controller Doorbell Repeater | setup_retry |
timeout 192.168.7.107:43507 |
Servers → IoT blocked → Q3.1 |
| mqtt | loaded |
zigbee2mqtt entities exist | Bridge likely dead while SLZB unreachable |
| tplink ×3 (EP10 plugs) | setup_retry |
timeout 192.168.1.248/107/39:9999 |
Plugs on GenPop; Servers → GenPop blocked → new rule or move plugs to IoT/Appliances |
| octoprint | setup_retry |
192.168.6.222:5000 connect failed |
LXC 120 stopped → Q5.1 |
| otbr OpenThread Border Router | setup_retry |
Unable to connect | Thread/Matter; may need IoT reachability |
| unifiprotect UDM SE | setup_error |
Authentication failed | Re-auth in HA UI (not ZBF) |
| synology_dsm ×2 | not_loaded |
stale 192.168.1.88 / .105 |
Remove or fix IPs (NAS is 192.168.6.215) |
| ipp / syncthru Samsung printer | setup_retry |
IPP timeout | Printer on GenPop .167 → VLAN + reachability |
| tuya_local feeder | migration_error |
— | Integration migration (not ZBF) |
| tuya cloud | not_loaded |
— | Disabled / unused? |
| upnp UDM | not_loaded |
— | Optional discovery |
| apple_tv Bedroom | setup_in_progress |
— | May need Personal → IoT → Q3.2 |
| srp_energy | setup_retry |
— | Utility API (unrelated) |
flowchart LR
HA["HA 192.168.6.227<br/>Servers VLAN"]
SLZB["SLZB .132<br/>IoT"]
Aqara["Aqara hub .82<br/>GenPop"]
TPL["TP-Link .1.x<br/>GenPop"]
HA x--x|blocked| SLZB
HA x--x|blocked| Aqara
HA x--x|blocked| TPL
Takeaway: A large share of HA pain maps directly to missing east-west firewall allows (Servers→IoT, Servers→GenPop) and wrong VLAN placement (Aqara, printer, TP-Link). Fixing Q3.1 + Q2.1 should move the needle before chasing individual integrations.
1. Access — resolved¶
All audit SSH/API paths are working except saltierpoop (not attempted).
~~Q1.1 — SSH to infra-services~~ ✅ Resolved¶
2026-06-18: Owner fixed WSL SSH to infra-services. Agent verified batch SSH
and enumerated compose stacks — see §0.1 above.
~~Q1.2 — SSH to Proxmox (proxbox)~~ ✅ Resolved¶
2026-06-18: Owner confirmed SSH config alias proxbox (1Password auth). Agent
verified ssh proxbox → hostname prox, full guest list in §0.2.
~~Q1.3 — Home Assistant long-lived access token~~ ✅ Resolved¶
2026-06-18: Token stored in local .env. API pull complete — see §0.3.
Token creation steps (reference)
1. Open `http://192.168.6.227:8123` → profile (username, bottom-left sidebar). 2. **Security → Long-Lived Access Tokens → Create Token** (name e.g. `homelab-audit-cursor`). 3. Add `HA_URL` + `HA_TOKEN` to repo-root `.env` (see `.env.example`).2. Network topology & VLAN placement¶
Live client counts (UniFi stat/sta)¶
| VLAN / network | Clients | Notes |
|---|---|---|
| Servers (4) | 14 | prox, infra-services, saltierpoop, HA, PiHole, NAS NICs, harbor, influx, pulse, k6, VMs on NAS |
| IoT (5) | 14 | SLZB coordinator now has IP .132 (was missing in June scan) |
| Personal (2) | 7 | Includes Fiio R7 .44 |
| Appliances (3) | 5 | Fellow Aiden on Appliances not IoT |
| Security (6) | 3 | Rack cam .11 live (inventory says .10) |
| GenPop (1) | 3 | See below — design says guests-only |
Q2.1 — GenPop has non-guest devices¶
Answers (2026-06-18):
| IP | Device | Decision |
|---|---|---|
| 192.168.1.167 | Samsung printer | Stay GenPop — print from Personal; GenPop → Servers already allowed |
| 192.168.1.82 | Aqara Hub M2 | Move to IoT WiFi |
| 192.168.1.218 | OnePlus 8 Pro | Personal VLAN 2 — on IsThisTheKrustyKrab (owner confirmed) |
Printer note: CaptainKangapoo at 192.168.3.17 is Personal VLAN 2
(192.168.3.0/24), not GenPop. Policy 3 targets that network.
2026-06-18: Allow rule exists (Action=Allow, 527k+ hits) but sits below Block inter-VLAN — must reorder above block. See troubleshooting.
- [x] Printer stays GenPop; Personal must reach
.167for printing - [x] Aqara → IoT WiFi
- [x] OnePlus back on Personal SSID — confirmed
Follow-up (owner):
- [ ] Move Aqara Hub M2 to IoT WiFi (in progress)
- [ ] Move TP-Link plugs when not in active use → README Owner TODO
Q2.2 — WiFi SSID leakage¶
Observed: Several Personal and IoT devices still associate via
The LAN Before Time (GenPop SSID) per client metadata in earlier scans; today
many show correct network names but GenPop still has 3 clients.
- [ ] Should The LAN Before Time be disabled except when guests visit?
- [ ] Should UniFi client isolation or minimum RSSI be used to force trusted devices onto EAP SSIDs?
Q2.3 — Security camera IP drift¶
| Device | Inventory | Live (2026-06-18) |
|---|---|---|
| Rack cam (G5 Flex) | 192.168.8.10 | 192.168.8.11 |
- [ ] Update inventory to
.11, or re-reserve.10on the camera?
3. Zone firewall (ZBF) — intent vs reality¶
Live user-defined policies (unchanged since design)¶
| Policy | Effect |
|---|---|
| GenPop → Servers | ALLOW |
| Personal → Servers | ALLOW |
| Personal → Appliances | ALLOW |
| Management → Internal | ALLOW |
| Block inter-VLAN (Internal) | BLOCK new (except above) |
| IoT/Security → Gateway mgmt ports | BLOCK 22/80/443 |
Not present: Servers → IoT, Personal → IoT, GenPop → IoT, any → IoT VLAN 5 except Personal→Appliances zone.
graph LR
HA["HA 192.168.6.227<br/>Servers VLAN"]
ZB["SLZB .132<br/>IoT VLAN"]
PH["Phone<br/>Personal VLAN"]
HP["HomePod .124<br/>IoT VLAN"]
HA -.->|"BLOCKED"| ZB
PH -.->|"mDNS only"| HP
Q3.1 — Home Assistant → IoT (critical for your symptoms)¶
Answer (2026-06-18): B — Narrow allow: 192.168.6.227 → IoT zone only.
→ Policy 1 in remediation runbook
Observed: HAOS VM 200 on Servers VLAN (192.168.6.227). Zigbee coordinator
(SLZB-06M) on IoT (192.168.7.132). No firewall allow for Servers → IoT.
HA evidence (§0.3): smlight SLZB-06M Connection failed; HomeKit Doorbell
Repeater timeout to 192.168.7.107; 394 unavailable entities.
- [x] B. Narrow allow: source
192.168.6.227only → IoT zone (Appliances + IoT VLANs)
Q3.1b — Home Assistant → GenPop (TP-Link plugs)¶
Answer (2026-06-18): B — Move smart plugs to IoT or Appliances WiFi and re-pair. → Owner actions in remediation runbook
- [x] B. Move smart plugs to IoT or Appliances WiFi and re-pair?
Q3.2 — Personal → IoT (AirPlay / HomeKit / phones)¶
Answer (2026-06-18): Yes — AirPlay must work. → Policy 2 in remediation runbook
- [x] Add Personal → IoT (VLAN 5) allow
Q3.3 — Personal → IoT VLAN 5 (not just Appliances)¶
Answer (2026-06-18): Not intentional — the Appliances-only allow was an accidental side effect of zone layout, not a deliberate “block smart speakers” rule. Covered by Q3.2 (add Personal → IoT VLAN 5).
Plain English: You have two “smart device” WiFis. The firewall already let your phone talk to Appliances WiFi (VLAN 3) but blocked IoT WiFi (VLAN 5) where HomePod and Apple TV live. mDNS made speakers visible; streaming was blocked. → Full explanation: phase-7r-zbf-remediation.md
- [x] Not intentional — allow Personal → IoT VLAN 5 (same as Q3.2)
Q3.4 — IoT → DNS on Servers (future AdGuard)¶
Answer (2026-06-18): Yes — allow IoT → 192.168.6.17:53 only when AdGuard
is deployed (defer until DNS cutover).
→ Deferred policy in remediation runbook
- [x] When cutting over DNS, OK to add IoT →
192.168.6.17:53only
Q3.5 — Fellow Aiden on Appliances VLAN¶
Observed: Coffee brewer at 192.168.5.39 (Appliances). Older mapping
expected IoT.
- [ ] Correct VLAN: Appliances or IoT?
4. DNS & Phase 7 checklist ordering¶
Q4.1 — AdGuard deploy status¶
Observed (SSH 2026-06-18):
/opt/homelab/services/adguard/exists withcompose.yml,dns-rewrites.yaml,unbound.conf.- No
compose.envor.envon host — stack never bootstrapped. docker compose psfor adguard: empty (not running).- Host
:53is systemd-resolved only; no AdGuard listener on192.168.6.17:53. - Homepage already binds host
:3000(AdGuard initial-setup docs also mention:3000).
Confirmed: AdGuard is not deployed. PiHole LXC 104 (blocktopus, .80) is still
the live DNS filter; UDM WAN DNS points there.
- [ ] Proceed with AdGuard deploy per Phase 7 runbook after ZBF remediation (Q3.4)?
- [ ] For first-time setup: use Traefik route only (
adguard.infra.realemail.app) and avoid binding AdGuard admin to host:3000(conflicts with homepage)?
Q4.2 — PiHole still WAN upstream¶
Observed: wan_dns1 = 192.168.6.80 (Blocktopus). Entire house depends on
LXC 104 PiHole for resolution.
- [ ] Is this intentional interim, or oversight since migration?
- [ ] OK to proceed with AdGuard only after Q3.4 answered?
5. Compute: Proxmox guests¶
Running (live)¶
| ID | Name | Role (inferred) | IP (UniFi) |
|---|---|---|---|
| 100 | saltierpoop | Saltbox media VM | .243 |
| 123 | infra-services | Docker/Komodo stack | .17 |
| 200 | haos | Home Assistant | .227 |
| 104 | blocktopus | PiHole LXC | .80 |
| 105 | k6-loadtest | Load testing | .223 |
| 111 | influxdb | Metrics TSDB | .132 |
| 116 | pulse | Proxmox pulse? | .199 |
| 119 | harbor-registry | Container registry | .119 |
Stopped but still allocated (selected)¶
| ID | Name | Inventory IP | onboot | Lab audit disposition |
|---|---|---|---|---|
| 120 | octoprint | .222 | 0 | KEEP? (Prusa on Appliances WiFi) |
| 114 | nfs-monitoring | .107 | 0 | MIGRATE per nfs-monitoring doc |
| 117 | caddy | .244 | — | DECOMMISSION? |
| 109 | graylog | .197 | — | DECOMMISSION |
| 102–122 | various | many null |
mostly 0 | CONSOLIDATE/DECOM per lab-audit |
Q5.1 — OctoPrint LXC stopped¶
Observed: LXC 120 stopped, onboot=0. Prusa (192.168.5.59) on Appliances WiFi.
- [ ] Should OctoPrint be running? If yes: start LXC 120 and confirm
.222lease? - [ ] Or printer controlled another way now?
Q5.2 — nfs-monitoring LXC stopped¶
Observed: LXC 114 stopped, onboot=0, inventory .107, not on network.
- [ ] Still needed for NFS monitoring, or safe to decommission per consolidation plan?
Q5.3 — k6-loadtest always on¶
Observed: LXC 105 running, on network at .223, onboot=1.
- [ ] Intentional 24/7, or should it be stopped when not testing?
Q5.4 — harbor-registry¶
Observed: LXC 119 running at .119; in inventory; not in June device map.
- [ ] What uses Harbor today? Keep / decom?
Q5.5 — saltierpoop memory pressure¶
Observed: VM 100 allocated 30720 MB, using ~30 GB of 30 GB (nearly maxed).
Inventory says ballooning: true; live QEMU config has balloon: 0.
- [ ] Is performance acceptable? Plan to reduce RAM or enable ballooning?
Q5.6 — Stopped guest backlog¶
Observed: 13+ stopped guests per lab-audit; still present on disk.
- [ ] Confirm destroy list from lab-audit.md §2
or provide updated keep/migrate/decom for:
graylog,caddy,metrimon,dnsproject,penpot,reactive-resume,netboot.xyz,unmanic,ollama,aiproject,mysql,sqlserver2022.
6. Synology (whrrr)¶
Q6.1 — NAS DNS¶
Observed (DSM API): NAS uses DNS 192.168.6.1 (gateway), not PiHole directly.
- [ ] Is NAS DNS behavior intentional?
Q6.2 — Customer VMs (Ubuncap / Recordurbate)¶
Observed: 192.168.6.100 (Ubuncap), 192.168.6.98 (Recordurbate) on Servers
VLAN. Inventory: customer-app, hosted on whrrr via VMM.
- [ ] Both VMs still active and correct?
- [ ] Any firewall rules needed between them and homelab services?
Q6.3 — Multi-homed NICs¶
Observed: LAN1 .215, LAN2 .214, LAN3 .216 on Servers VLAN; ovs_eth3/4
link-local only.
- [ ] Still plan per-VLAN IPs on separate NICs (deferred in lab-audit), or keep all on Servers VLAN?
7. Security & exposure¶
Q7.1 — WAN port forwards (unchanged)¶
Observed: TCP/UDP 80, 8080, 443 → 192.168.6.243 (saltierpoop).
- [ ] Still required for all three ports?
- [ ] Traefik on infra-services vs saltierpoop ingress — still dual-stack by design?
Q7.2 — SEC-001 SMB forward¶
Security register: SEC-001 still Open (public SMB to NAS).
- [ ] Confirm removed on UDM, or still present? (API
port_forwardsdid not show SMB.)
8. Inventory & documentation hygiene¶
Q8.1 — network-scan.py “unknown” clients¶
Observed: 31 clients not in inventory — mostly personal phones, IoT gadgets, appliances (expected). Inventory is not meant to list every consumer device.
- [ ] Should we extend inventory with an
iot-devices.yaml/endpoints.yamlfor smart home gear, or keep inventory infra-only?
Q8.2 — Missing from UniFi stat/sta but in inventory¶
Includes: octoprint (stopped), nfs-monitoring (stopped), caddy, graylog,
sqlserver2022, plus UniFi gear (often not in stat/sta as clients).
- [ ] Any of these should be online but aren't?
Q8.3 — infra-services host drift vs repo¶
Observed (SSH 2026-06-18):
- No
services/backup-client/on host (Phase 6 TODO still open). -
Host
/opt/homelabat2e073fdwith local modifications/untracked files (backups/restore-test.sh,services/monitoring/.env.sops.yaml, homepage config). -
[ ] OK to reconcile host checkout (pull + ansible-pull) before AdGuard deploy?
- [ ] Priority for backup-client deploy relative to ZBF remediation?
9. Path decision (your call after answering)¶
We are not asking you to pick TODO-sweep vs Phase 7 blindly. Based on this audit, the likely sequence is:
- Apply UDM policies 1–4 — phase-7r-zbf-remediation.md
- Move Aqara hub to IoT WiFi (Q2.1)
- Move TP-Link plugs (Q3.1b) and re-pair in HA
-
Then AdGuard deploy + DNS cutover (Q4.x) + IoT→AdGuard:53 (Q3.4)
-
[ ] Agree with remediation-first sequence? If not, what would you prioritize instead?
10. Freeform¶
Anything else broken, recently changed, or “works but shouldn’t” that we didn’t ask:
After you return answers, next deliverables: updated network-live.md, ZBF
remediation runbook, inventory fixes, and a short journal entry.