Skip to content

Phase 7 Owner Actions — Network, ACLs, and Security Tightening

Everything you need to do for Phase 7. Tasks are ordered by dependency — do them top-to-bottom. Each section tells you where to go, what to click, and how to verify.


Progress

# Task SEC Status
1 Zone-Based Firewall policies SEC-002 Done (2026-05-14)
2 Delete DSM port forwards SEC-003 Done (2026-05-15)
3 WiFi SSID-to-VLAN mapping SEC-005 Done (2026-05-15)
4 Confirm DSM via Tailscale SEC-003 To do
5-pre Generate + encrypt three Tailscale keys (Ansible + secrets/tailscale/*) SEC-007 To do
5a-c Deploy Tailscale (Ansible-managed hosts) SEC-007 To do
5d-g Deploy Tailscale (manual hosts) SEC-007 To do
6 Create Tailscale API key + GitHub secrets SEC-007 To do
7 Deploy AdGuard + Unbound To do
8 Cut over DNS to AdGuard To do
9 Decommission PiHole (LXC 104) To do

4. Confirm DSM Accessible via Tailscale (SEC-003)

Why: DSM port forwards are now deleted. The only way to reach DSM remotely is via Tailscale. Verify this works before moving on.

Steps

  1. Connect to Tailscale on your phone or laptop (off home WiFi — cellular or remote network)
  2. Open https://100.71.93.130:5001 in a browser
  3. DSM login should appear

If it doesn't work

  • Check that the whrrr node is online: run tailscale status on any connected device and look for whrrr
  • If whrrr is offline, SSH to the Synology and run sudo tailscale up
  • If the Tailscale package isn't installed on DSM, see Section 5 below

5. Deploy Tailscale to All Hosts

Read this scope carefully — easy to over-read. Phase 7 does not mean “every endpoint, URL, or service in the lab magically runs through Tailscale.” It means: (1) the seven machines listed below each get a Tailscale client (or you deliberately skip one), and (2) optional LAN reachability into 192.168.6.0/24 only via subnet routing from infra-services, which is extra operator steps (approve routes, ACLs, --accept-routes on clients — see below). Other VLANs, WAN-only names (Cloudflare), and random LXCs are out of scope unless you add routes or install Tailscale there too.

Tailscale deployment is split into two tracks:

  • Ansible-managed hosts (infra-services, prox, saltierpoop): Fully automated via the tailscale Ansible role. Tags, routes, IP forwarding, and UFW rules are all driven from inventory YAML.
  • Non-Ansible hosts (whrrr, haos, recordurbate, ubuncap): Manual install. Synology uses a DSM package; HAOS uses a first-party add-on; customer-app VMs use curl | sh.

Deployment checklist

# Host Type Managed by Tag Status
5a infra-services VM (Ubuntu) Ansible tag:server To do
5b prox Bare metal (Debian/PVE) Ansible tag:server To do
5c saltierpoop VM (Ubuntu) Ansible tag:server To do
5d whrrr Synology RS2421+ Manual tag:nas To do
5e haos VM (HAOS) Manual tag:server To do
5f recordurbate VM (Linux) Manual tag:customer-app To do
5g ubuncap VM (Linux) Manual tag:customer-app To do

Subnet routing (optional path to some LAN targets, not “all endpoints”): When infra-services is up as a subnet router, it can advertise 192.168.6.0/24 so tailnet clients may reach IP addresses on that VLAN (LXCs, bare services) without installing Tailscale on each box. That path only works after you approve the route in the admin console, ensure ACLs allow the traffic, and turn on subnet route acceptance on every client device you use (see “Post-deployment: accept routes” later). It does not give you Tailscale HTTPS names for every internal service, does not cover other subnets/VLANs, and does not replace manual Tailscale on the four non-Ansible hosts above.


5-pre. Tailscale auth keys (three-key split)

Use separate reusable keys so the automation key never carries tag:nas or tag:customer-app. Full operator checklist (when to create each key, encrypt, commit, decrypt one-liners): secrets/tailscale/README.md.

5-pre-a. Ansible key (tag:server only)

The tailscale Ansible role reads infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml. The key must be pre-authorized for only tag:server (covers infra-services, prox, saltierpoop).

  1. Go to https://login.tailscale.com/admin/settings/keys
  2. Click Generate auth key
  3. Set: Reusable = yes, Pre-authorized = yes
  4. Under Tags, add only tag:server
  5. Copy the key (starts with tskey-auth-...)

On infra-services, encrypt it into the Ansible group vars:

ssh someone@192.168.6.17
cd /opt/homelab

# Write plaintext at the target path (SOPS matches creation rules by path)
cat > infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml <<'EOF'
---
tailscale_auth_key: "tskey-auth-PASTE_YOUR_KEY_HERE"
EOF

# Encrypt in-place
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
  sops -e -i infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml

# Commit and push
git add infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml
git commit -m "chore: add SOPS-encrypted Tailscale auth key"
git push

5-pre-b. NAS and customer-app keys (tag:nas / tag:customer-app)

Manual installs use secrets/tailscale/nas.sops.yaml and secrets/tailscale/customer-app.sops.yaml (see examples in that directory). Create one Tailscale key per file (each key pre-authorized for one tag family only), encrypt with SOPS, commit ciphertext, then use §5d / §5f–5g.

Do not store those keys in tailscale.sops.yaml and do not reuse the Ansible key on Synology or customer-app VMs.


5a–5c. Ansible-managed hosts (infra-services, prox, saltierpoop)

These three hosts are in the tailscale Ansible group. The role handles:

  • Installing Tailscale via official apt repo
  • Enabling IP forwarding on subnet routers (infra-services)
  • Running tailscale up with the correct tags and routes
  • Opening UFW port 41641/udp and allowing the tailscale0 interface

If ansible-pull dies on Authenticate Tailscale with auth key (Ansible hides output with no_log): on the host run sudo journalctl -u tailscaled -n 40 --no-pager. A common message is requested tags [tag:…] are invalid or not permitted — the key used on that host was not pre-authorized for the tag passed to tailscale up.

  • Ansible hosts: the key in tailscale.sops.yaml must allow only tag:server (see §5-pre-a).
  • whrrr: use the nas key from secrets/tailscale/nas.sops.yaml (§5-pre-b).
  • recordurbate / ubuncap: use secrets/tailscale/customer-app.sops.yaml.

Fix keys in the Tailscale admin keys UI, update the right SOPS file, commit, push, and retry.

Option A — Wait for ansible-pull (automatic, 30-minute cycle):

The auth key commit above will be picked up on the next ansible-pull cycle. Check status after ~30 minutes:

ssh someone@192.168.6.17
tailscale status

Option B — Run manually now (recommended for first deployment):

ansible-pull keeps its own clone under /var/lib/ansible-pull/homelab (common_pull_workdir in Ansible). That directory is only a Git repo after ansible-pull-apply has run successfully at least once. If cd /var/lib/ansible-pull/homelab && git status says not a git repository, either the timer has never completed a pull, or the workdir was never populated — check:

sudo systemctl status ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 50 --no-pager
ls -la /var/lib/ansible-pull/homelab

You can kick a run once: sudo systemctl start ansible-pull-apply.service then re-check for .git.

Two different Git trees: ansible-pull always runs from /var/lib/ansible-pull/homelab. A git pull in /opt/homelab only updates your operator clone. If main on GitHub already contains a fix but ansible-pull logs still show an old failure, the pull workdir is probably behind — reconcile it explicitly (as root, same SSH key rules as the unit), then re-enable the timer if you stopped it:

sudo git -C /var/lib/ansible-pull/homelab fetch origin
sudo git -C /var/lib/ansible-pull/homelab status
sudo git -C /var/lib/ansible-pull/homelab merge --ff-only origin/main
sudo systemctl start ansible-pull-apply.timer
sudo systemctl start ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 80 --no-pager

Use sudo journalctl … for unit output (otherwise you may only see your own user messages).

If you already maintain /opt/homelab as a normal clone (manual git pull / pushes), you can run the playbook from there instead — paths are the same relative to repo root:

ssh someone@192.168.6.17
cd /opt/homelab/infra/ansible   # ansible.cfg + roles_path live here
git pull   # your usual clone; deploy key for fetch

# When you run ON infra-services, use local connection (no SSH loopback to .17).
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
  -i inventory/generated.yml \
  playbooks/site.yml \
  --tags tailscale \
  --limit infra-services \
  -e ansible_connection=local

Same paths from repo root if you prefer to stay in /opt/homelab (add -e ansible_connection=local when running on infra-services itself):

export ANSIBLE_CONFIG=/opt/homelab/infra/ansible/ansible.cfg
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
  -i infra/ansible/inventory/generated.yml \
  infra/ansible/playbooks/site.yml \
  --tags tailscale \
  --limit infra-services \
  -e ansible_connection=local

If you use the ansible-pull workdir (and it is a valid clone):

ssh someone@192.168.6.17
cd /var/lib/ansible-pull/homelab/infra/ansible
git pull   # uses deploy key; must fast-forward / rebase per that clone’s config

SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
  -i inventory/generated.yml \
  playbooks/site.yml \
  --tags tailscale \
  --limit infra-services \
  -e ansible_connection=local

Shared checkout (operators + root): The common role creates UNIX group homelab-pull, adds common_github_ssh_users to it, sets the workdir to root:homelab-pull mode 2770 (setgid), normalizes the clone once, sets git config core.sharedRepository group, and adds UMask=0002 to the ansible-pull systemd units. After one successful apply, log out and back in (or newgrp homelab-pull) so your session has the group; then cd /var/lib/ansible-pull/homelab && git pull as someone needs no sudo and no safe.directory workaround.

Repeat with --limit prox and --limit saltierpoop (from a control node that can SSH to those hosts, or run on each host from its own clone if you set one up there).

Connection note: --limit infra-services uses ansible_host from inventory (e.g. 192.168.6.17). Running the playbook on infra-services still uses SSH to that address; ensure login as someone works to the VM’s own IP (loopback path is fine).

If ansible-pull-apply fails with fatal: could not read Username for 'https://github.com': No such device or address:

The systemd unit is cloning over HTTPS. Unattended pulls must use the SSH URL (git@github.com:notarealemail/homelab.git) so ~/.ssh/config can use the read-only deploy key (no interactive username).

  1. Fix both units, reload, retry:
sudo sed -i 's#https://github.com/notarealemail/homelab.git#git@github.com:notarealemail/homelab.git#g' \
  /etc/systemd/system/ansible-pull-apply.service \
  /etc/systemd/system/ansible-pull-check.service
sudo systemctl daemon-reload
sudo systemctl start ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 30 --no-pager
  1. If /var/lib/ansible-pull/homelab/.git already exists, point origin at SSH as well:
cd /var/lib/ansible-pull/homelab
git remote set-url origin git@github.com:notarealemail/homelab.git
  1. Prefer re-applying the common role from /opt/homelab so units match repo templates: ansible-playbook … --tags ansible-pull (after units use SSH, timers stay correct).

If sudo systemctl start ansible-pull-apply “hangs” (no prompt for many minutes):

ansible-pull-apply is Type=oneshot: systemctl start blocks until the entire playbook finishes. A full site.yml run can easily take 10–25+ minutes (apt, downloads, multiple roles). That is normal, not a frozen shell.

  • In another SSH session, follow logs live:
sudo journalctl -fu ansible-pull-apply.service
  • When it completes, systemctl start returns and systemctl status shows inactive (dead) with a result.

If the journal stops on tailscale up for a long time, the Tailscale role now wraps that in a timeout so a bad/missing auth key cannot block forever (pull latest repo before relying on that).

After each host registers:

  1. Go to https://login.tailscale.com/admin/machines
  2. For infra-services: Click three-dot menu > Edit route settings > Approve the 192.168.6.0/24 subnet route
  3. Verify: tailscale status on each host should show "Connected"

What the role configures per host:

Host Tags Routes IP forwarding UFW
infra-services tag:server 192.168.6.0/24 Yes (sysctl) 41641/udp + tailscale0
prox tag:server No Skipped (pve-firewall)
saltierpoop tag:server No 41641/udp + tailscale0

5d. whrrr (Synology DSM) — Manual

Prerequisite: secrets/tailscale/nas.sops.yaml exists in git (encrypted) per secrets/tailscale/README.md and §5-pre-b.

Tailscale on Synology runs as a DSM package. Upstream documents Synology limitations (no tailscale up --accept-routes, no exit-node client behavior on the NAS itself): tailscale/tailscale#1995. whrrr can still join the tailnet with tag:nas and you can reach DSM at the Synology Tailscale IP; you cannot expect whrrr to follow subnet routes advertised by other nodes.

  1. Open DSM > Package Center > verify Tailscale is installed and running
  2. From a host that has this repo and sops (usually your laptop), run the NAS one-liner in secrets/tailscale/README.md (it SSHs to Synology and passes --authkey=…). If you cannot run sops from that path, decrypt locally, then paste the key into sudo tailscale up --authkey=… in an SSH session to the NAS (avoid leaving the key in shell history where possible).

If tailscale up prints “Some peers are advertising routes but --accept-routes is false”, that is expected on Synology: DSM cannot enable --accept-routes (Tailscale #1995). It is a notice, not a failed login — your NAS is still on the tailnet. Ignore it unless you need whrrr itself to reach LAN-only targets via another node’s advertised subnets (unsupported on DSM).

If the CLI isn't accessible, apply the tag:nas tag from the Tailscale admin console:

  1. Go to https://login.tailscale.com/admin/machines
  2. Find whrrr
  3. Click three-dot menu > Edit ACL tags > add tag:nas

5e. haos (Home Assistant OS) — Manual (HA add-on)

HAOS has a first-party Tailscale add-on. It does not read tailscale.sops.yaml or secrets/tailscale/*.yaml; authenticate with the URL from the add-on log unless you configure a key there separately. If you ever need a reusable key for HA, create another tag:server-only key and store it in 1Passworddo not reuse the Ansible automation key outside ansible-pull.

  1. Open Home Assistant > Settings > Add-ons > Add-on Store
  2. Search for Tailscale and install it
  3. Go to the add-on Configuration tab and set:
tags:
  - tag:server
accept_routes: true

Then start the add-on. Check the add-on Log tab for the auth URL — open it to authenticate.

After install: Home Assistant at http://<tailscale-ip>:8123 works from anywhere on the tailnet.


5f–5g. recordurbate and ubuncap — Manual

Prerequisite: secrets/tailscale/customer-app.sops.yaml committed (encrypted) per secrets/tailscale/README.md and §5-pre-b.

These are customer-app VMs on whrrr's VMM. Use the same customer-app key on both hosts. Run the customer-app one-liner in secrets/tailscale/README.md per host (from a machine with the repo and sops), or decrypt locally and use sudo tailscale up --authkey=… --advertise-tags=tag:customer-app --accept-routes on each VM.


Post-deployment: accept routes on your devices

After infra-services is advertising 192.168.6.0/24, enable route acceptance on each client device you use:

  • macOS/Linux: tailscale up --accept-routes or toggle in the Tailscale menu bar app
  • iOS/Android: Tailscale app > Settings > "Use Tailscale subnets"
  • Windows: Tailscale system tray > "Use subnet routes"

6. Create Tailscale API Key + GitHub Secrets (SEC-007)

Where: Tailscale Admin Console + GitHub repo settings

Steps

  1. Go to https://login.tailscale.com/admin/settings/keys
  2. Create a new API key
  3. Note your tailnet name (visible at the top of the admin console)
  4. Go to your GitHub repo > Settings > Secrets and variables > Actions
  5. Add two secrets:
  6. TS_API_KEY — paste the API key
  7. TS_TAILNET — paste the tailnet name (e.g., yourtailnet.ts.net)

Verify

  • Push any change to infra/tailscale/acl.json and merge to main
  • The Tailscale ACL Sync workflow should run and succeed
  • Check Tailscale admin > Access Controls — it should match acl.json

Reference: infra/tailscale/README.md


7. Deploy AdGuard + Unbound on infra-services

Where: SSH to infra-services

Steps

  1. Pull the latest code:
cd /opt/homelab
git pull
  1. Add a DNS record for adguard.infra.realemail.app on the UDM:
  2. Settings > Policy Table > add adguard.infra.realemail.app192.168.6.17

  3. Start the stack:

cd /opt/homelab/services/adguard
docker compose up -d
  1. Open https://adguard.infra.realemail.app in your browser for initial setup (not :3000 — homepage uses that port on the host):
  2. Set admin username and password
  3. Set the listen interface to 0.0.0.0:53 for DNS
  4. Set upstream DNS to udp://unbound with bootstrap 127.0.0.11 (see AdGuard upstream (repo README))

  5. Import DNS rewrites from inventory (optional but recommended):

cd /opt/homelab
# The file is already generated at services/adguard/dns-rewrites.yaml
# Add them in the AdGuard UI: Filters > DNS rewrites
# Or script it with curl (see services/adguard/README.md)

Verify

  • dig @192.168.6.17 google.com should return an A record
  • dig @192.168.6.17 infra-services.lab.local should return 192.168.6.17 (if you imported the rewrites)
  • https://adguard.infra.realemail.app should load the AdGuard dashboard

Reference: services/adguard/README.md


8. Cut Over DNS from PiHole to AdGuard

Where: UDM SE > Settings > Networks + Settings > Internet

Prerequisite: AdGuard running and verified (step 7). infra-services must have Tailscale prefer-main routing for 192.168.6.0/24 (Ansible tailscale role — see AdGuard — Servers VLAN caveat).

Steps

  1. Internet → WAN1 → DNS (IPv4 only for now):
  2. Manual DNS 192.168.6.17
  3. Clear any legacy PiHole IPv4 (192.168.6.80) and IPv6 entries
  4. IPv6 DNS: leave blank until AdGuard publishes v6 (optional later)

  5. For each VLAN network on the UDM:

  6. Settings > Networks > click the network
  7. Under DHCP > DNS Server → 192.168.6.17
  8. Save

Servers VLAN (4): can use 192.168.6.17 directly now that the Tailscale routing fix is in place. Gateway-as-DNS (.1) also works if WAN upstream points at AdGuard.

  1. Renew DHCP on clients (or reboot). On Ubuntu hosts, avoid one-off resolvectl dns — use netplan dhcp4-overrides: use-dns: false only if you need static overrides.

Verify

  • dig @192.168.6.17 google.com from a Servers VLAN host (e.g. saltierpoop)
  • dig google.com on that host (systemd-resolved / DHCP DNS)
  • Browse the web; check AdGuard dashboard for query activity
  • dig @192.168.6.17 infra-services.lab.local192.168.6.17

Parallel run

Keep PiHole (LXC 104) running for 48 hours after cutover. If anything breaks, revert DHCP DNS to 192.168.6.80 and WAN DNS to PiHole on the UDM.


9. Decommission PiHole LXC 104 (blocktopus)

Where: Proxmox UI or CLI

Prerequisite: 48-hour parallel run with AdGuard (step 8) verified.

Steps

  1. Verify no clients are still using PiHole:
  2. SSH to PiHole (192.168.6.80) and check query logs
  3. If queries are still coming in, something still points at the old DNS

  4. Stop the LXC:

ssh root@192.168.6.71  # Proxmox
pct stop 104
  1. Wait 24-48 hours. If nothing breaks, destroy it:
pct destroy 104 --purge
  1. Let me know when done so I can update inventory and docs.