Phase 7 Owner Actions — Network, ACLs, and Security Tightening¶
Everything you need to do for Phase 7. Tasks are ordered by dependency — do them top-to-bottom. Each section tells you where to go, what to click, and how to verify.
Progress¶
| # | Task | SEC | Status |
|---|---|---|---|
| 1 | Zone-Based Firewall policies | SEC-002 | Done (2026-05-14) |
| 2 | Delete DSM port forwards | SEC-003 | Done (2026-05-15) |
| 3 | WiFi SSID-to-VLAN mapping | SEC-005 | Done (2026-05-15) |
| 4 | Confirm DSM via Tailscale | SEC-003 | To do |
| 5-pre | Generate + encrypt three Tailscale keys (Ansible + secrets/tailscale/*) |
SEC-007 | To do |
| 5a-c | Deploy Tailscale (Ansible-managed hosts) | SEC-007 | To do |
| 5d-g | Deploy Tailscale (manual hosts) | SEC-007 | To do |
| 6 | Create Tailscale API key + GitHub secrets | SEC-007 | To do |
| 7 | Deploy AdGuard + Unbound | — | To do |
| 8 | Cut over DNS to AdGuard | — | To do |
| 9 | Decommission PiHole (LXC 104) | — | To do |
4. Confirm DSM Accessible via Tailscale (SEC-003)¶
Why: DSM port forwards are now deleted. The only way to reach DSM remotely is via Tailscale. Verify this works before moving on.
Steps¶
- Connect to Tailscale on your phone or laptop (off home WiFi — cellular or remote network)
- Open
https://100.71.93.130:5001in a browser - DSM login should appear
If it doesn't work¶
- Check that the
whrrrnode is online: runtailscale statuson any connected device and look forwhrrr - If
whrrris offline, SSH to the Synology and runsudo tailscale up - If the Tailscale package isn't installed on DSM, see Section 5 below
5. Deploy Tailscale to All Hosts¶
Read this scope carefully — easy to over-read. Phase 7 does not mean
“every endpoint, URL, or service in the lab magically runs through
Tailscale.” It means: (1) the seven machines listed below each get a
Tailscale client (or you deliberately skip one), and (2) optional LAN
reachability into 192.168.6.0/24 only via subnet routing from
infra-services, which is extra operator steps (approve routes, ACLs,
--accept-routes on clients — see below). Other VLANs, WAN-only names
(Cloudflare), and random LXCs are out of scope unless you add routes or
install Tailscale there too.
Tailscale deployment is split into two tracks:
- Ansible-managed hosts (infra-services, prox, saltierpoop): Fully
automated via the
tailscaleAnsible role. Tags, routes, IP forwarding, and UFW rules are all driven from inventory YAML. - Non-Ansible hosts (whrrr, haos, recordurbate, ubuncap): Manual
install. Synology uses a DSM package; HAOS uses a first-party add-on;
customer-app VMs use
curl | sh.
Deployment checklist¶
| # | Host | Type | Managed by | Tag | Status |
|---|---|---|---|---|---|
| 5a | infra-services | VM (Ubuntu) | Ansible | tag:server |
To do |
| 5b | prox | Bare metal (Debian/PVE) | Ansible | tag:server |
To do |
| 5c | saltierpoop | VM (Ubuntu) | Ansible | tag:server |
To do |
| 5d | whrrr | Synology RS2421+ | Manual | tag:nas |
To do |
| 5e | haos | VM (HAOS) | Manual | tag:server |
To do |
| 5f | recordurbate | VM (Linux) | Manual | tag:customer-app |
To do |
| 5g | ubuncap | VM (Linux) | Manual | tag:customer-app |
To do |
Subnet routing (optional path to some LAN targets, not “all endpoints”):
When infra-services is up as a subnet router, it can advertise
192.168.6.0/24 so tailnet clients may reach IP addresses on that
VLAN (LXCs, bare services) without installing Tailscale on each box.
That path only works after you approve the route in the admin console,
ensure ACLs allow the traffic, and turn on subnet route acceptance on
every client device you use (see “Post-deployment: accept routes” later).
It does not give you Tailscale HTTPS names for every internal service,
does not cover other subnets/VLANs, and does not replace manual Tailscale on
the four non-Ansible hosts above.
5-pre. Tailscale auth keys (three-key split)¶
Use separate reusable keys so the automation key never carries tag:nas or
tag:customer-app. Full operator checklist (when to create each key, encrypt,
commit, decrypt one-liners): secrets/tailscale/README.md.
5-pre-a. Ansible key (tag:server only)¶
The tailscale Ansible role reads
infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml. The key
must be pre-authorized for only tag:server (covers infra-services,
prox, saltierpoop).
- Go to https://login.tailscale.com/admin/settings/keys
- Click Generate auth key
- Set: Reusable = yes, Pre-authorized = yes
- Under Tags, add only
tag:server - Copy the key (starts with
tskey-auth-...)
On infra-services, encrypt it into the Ansible group vars:
ssh someone@192.168.6.17
cd /opt/homelab
# Write plaintext at the target path (SOPS matches creation rules by path)
cat > infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml <<'EOF'
---
tailscale_auth_key: "tskey-auth-PASTE_YOUR_KEY_HERE"
EOF
# Encrypt in-place
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
sops -e -i infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml
# Commit and push
git add infra/ansible/inventory/group_vars/tailscale/tailscale.sops.yaml
git commit -m "chore: add SOPS-encrypted Tailscale auth key"
git push
5-pre-b. NAS and customer-app keys (tag:nas / tag:customer-app)¶
Manual installs use secrets/tailscale/nas.sops.yaml and
secrets/tailscale/customer-app.sops.yaml (see examples in that directory).
Create one Tailscale key per file (each key pre-authorized for one tag
family only), encrypt with SOPS, commit ciphertext, then use §5d / §5f–5g.
Do not store those keys in tailscale.sops.yaml and do not reuse the
Ansible key on Synology or customer-app VMs.
5a–5c. Ansible-managed hosts (infra-services, prox, saltierpoop)¶
These three hosts are in the tailscale Ansible group. The role handles:
- Installing Tailscale via official apt repo
- Enabling IP forwarding on subnet routers (infra-services)
- Running
tailscale upwith the correct tags and routes - Opening UFW port 41641/udp and allowing the
tailscale0interface
If ansible-pull dies on Authenticate Tailscale with auth key (Ansible
hides output with no_log): on the host run
sudo journalctl -u tailscaled -n 40 --no-pager. A common message is
requested tags [tag:…] are invalid or not permitted — the key used on
that host was not pre-authorized for the tag passed to tailscale up.
- Ansible hosts: the key in
tailscale.sops.yamlmust allow onlytag:server(see §5-pre-a). - whrrr: use the nas key from
secrets/tailscale/nas.sops.yaml(§5-pre-b). - recordurbate / ubuncap: use
secrets/tailscale/customer-app.sops.yaml.
Fix keys in the Tailscale admin keys UI, update the right SOPS file, commit, push, and retry.
Option A — Wait for ansible-pull (automatic, 30-minute cycle):
The auth key commit above will be picked up on the next ansible-pull
cycle. Check status after ~30 minutes:
Option B — Run manually now (recommended for first deployment):
ansible-pull keeps its own clone under /var/lib/ansible-pull/homelab
(common_pull_workdir in Ansible). That directory is only a Git repo
after ansible-pull-apply has run successfully at least once. If
cd /var/lib/ansible-pull/homelab && git status says not a git repository,
either the timer has never completed a pull, or the workdir was never
populated — check:
sudo systemctl status ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 50 --no-pager
ls -la /var/lib/ansible-pull/homelab
You can kick a run once: sudo systemctl start ansible-pull-apply.service
then re-check for .git.
Two different Git trees: ansible-pull always runs from
/var/lib/ansible-pull/homelab. A git pull in /opt/homelab only
updates your operator clone. If main on GitHub already contains a fix but
ansible-pull logs still show an old failure, the pull workdir is probably
behind — reconcile it explicitly (as root, same SSH key rules as the
unit), then re-enable the timer if you stopped it:
sudo git -C /var/lib/ansible-pull/homelab fetch origin
sudo git -C /var/lib/ansible-pull/homelab status
sudo git -C /var/lib/ansible-pull/homelab merge --ff-only origin/main
sudo systemctl start ansible-pull-apply.timer
sudo systemctl start ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 80 --no-pager
Use sudo journalctl … for unit output (otherwise you may only see your
own user messages).
If you already maintain /opt/homelab as a normal clone (manual
git pull / pushes), you can run the playbook from there instead — paths
are the same relative to repo root:
ssh someone@192.168.6.17
cd /opt/homelab/infra/ansible # ansible.cfg + roles_path live here
git pull # your usual clone; deploy key for fetch
# When you run ON infra-services, use local connection (no SSH loopback to .17).
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
-i inventory/generated.yml \
playbooks/site.yml \
--tags tailscale \
--limit infra-services \
-e ansible_connection=local
Same paths from repo root if you prefer to stay in /opt/homelab (add
-e ansible_connection=local when running on infra-services itself):
export ANSIBLE_CONFIG=/opt/homelab/infra/ansible/ansible.cfg
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
-i infra/ansible/inventory/generated.yml \
infra/ansible/playbooks/site.yml \
--tags tailscale \
--limit infra-services \
-e ansible_connection=local
If you use the ansible-pull workdir (and it is a valid clone):
ssh someone@192.168.6.17
cd /var/lib/ansible-pull/homelab/infra/ansible
git pull # uses deploy key; must fast-forward / rebase per that clone’s config
SOPS_AGE_KEY_FILE=/etc/homelab/age-key.txt \
ansible-playbook \
-i inventory/generated.yml \
playbooks/site.yml \
--tags tailscale \
--limit infra-services \
-e ansible_connection=local
Shared checkout (operators + root): The common role creates UNIX group
homelab-pull, adds common_github_ssh_users to it, sets the workdir to
root:homelab-pull mode 2770 (setgid), normalizes the clone once, sets
git config core.sharedRepository group, and adds UMask=0002 to the
ansible-pull systemd units. After one successful apply, log out and back in
(or newgrp homelab-pull) so your session has the group; then
cd /var/lib/ansible-pull/homelab && git pull as someone needs no sudo and
no safe.directory workaround.
Repeat with --limit prox and --limit saltierpoop (from a control node that
can SSH to those hosts, or run on each host from its own clone if you set one
up there).
Connection note: --limit infra-services uses ansible_host from
inventory (e.g. 192.168.6.17). Running the playbook on infra-services
still uses SSH to that address; ensure login as someone works to the VM’s
own IP (loopback path is fine).
If ansible-pull-apply fails with
fatal: could not read Username for 'https://github.com': No such device or address:
The systemd unit is cloning over HTTPS. Unattended pulls must use the
SSH URL (git@github.com:notarealemail/homelab.git) so ~/.ssh/config can
use the read-only deploy key (no interactive username).
- Fix both units, reload, retry:
sudo sed -i 's#https://github.com/notarealemail/homelab.git#git@github.com:notarealemail/homelab.git#g' \
/etc/systemd/system/ansible-pull-apply.service \
/etc/systemd/system/ansible-pull-check.service
sudo systemctl daemon-reload
sudo systemctl start ansible-pull-apply.service
sudo journalctl -u ansible-pull-apply.service -n 30 --no-pager
- If
/var/lib/ansible-pull/homelab/.gitalready exists, pointoriginat SSH as well:
- Prefer re-applying the common role from
/opt/homelabso units match repo templates:ansible-playbook … --tags ansible-pull(after units use SSH, timers stay correct).
If sudo systemctl start ansible-pull-apply “hangs” (no prompt for many
minutes):
ansible-pull-apply is Type=oneshot: systemctl start blocks until
the entire playbook finishes. A full site.yml run can easily take 10–25+
minutes (apt, downloads, multiple roles). That is normal, not a frozen shell.
- In another SSH session, follow logs live:
- When it completes,
systemctl startreturns andsystemctl statusshowsinactive (dead)with a result.
If the journal stops on tailscale up for a long time, the Tailscale role
now wraps that in a timeout so a bad/missing auth key cannot block
forever (pull latest repo before relying on that).
After each host registers:
- Go to https://login.tailscale.com/admin/machines
- For
infra-services: Click three-dot menu > Edit route settings > Approve the192.168.6.0/24subnet route - Verify:
tailscale statuson each host should show "Connected"
What the role configures per host:
| Host | Tags | Routes | IP forwarding | UFW |
|---|---|---|---|---|
| infra-services | tag:server |
192.168.6.0/24 |
Yes (sysctl) | 41641/udp + tailscale0 |
| prox | tag:server |
— | No | Skipped (pve-firewall) |
| saltierpoop | tag:server |
— | No | 41641/udp + tailscale0 |
5d. whrrr (Synology DSM) — Manual¶
Prerequisite: secrets/tailscale/nas.sops.yaml exists in git (encrypted)
per secrets/tailscale/README.md and §5-pre-b.
Tailscale on Synology runs as a DSM package. Upstream documents Synology
limitations (no tailscale up --accept-routes, no exit-node client behavior on
the NAS itself): tailscale/tailscale#1995.
whrrr can still join the tailnet with tag:nas and you can reach DSM at the
Synology Tailscale IP; you cannot expect whrrr to follow subnet routes
advertised by other nodes.
- Open DSM > Package Center > verify Tailscale is installed and running
- From a host that has this repo and
sops(usually your laptop), run the NAS one-liner insecrets/tailscale/README.md(it SSHs to Synology and passes--authkey=…). If you cannot runsopsfrom that path, decrypt locally, then paste the key intosudo tailscale up --authkey=…in an SSH session to the NAS (avoid leaving the key in shell history where possible).
If tailscale up prints “Some peers are advertising routes but
--accept-routes is false”, that is expected on Synology: DSM cannot enable
--accept-routes (Tailscale #1995).
It is a notice, not a failed login — your NAS is still on the tailnet.
Ignore it unless you need whrrr itself to reach LAN-only targets via
another node’s advertised subnets (unsupported on DSM).
If the CLI isn't accessible, apply the tag:nas tag from the Tailscale
admin console:
- Go to https://login.tailscale.com/admin/machines
- Find
whrrr - Click three-dot menu > Edit ACL tags > add
tag:nas
5e. haos (Home Assistant OS) — Manual (HA add-on)¶
HAOS has a first-party Tailscale add-on. It does not read
tailscale.sops.yaml or secrets/tailscale/*.yaml; authenticate with the
URL from the add-on log unless you configure a key there separately. If you
ever need a reusable key for HA, create another tag:server-only key and
store it in 1Password — do not reuse the Ansible automation key outside
ansible-pull.
- Open Home Assistant > Settings > Add-ons > Add-on Store
- Search for Tailscale and install it
- Go to the add-on Configuration tab and set:
Then start the add-on. Check the add-on Log tab for the auth URL — open it to authenticate.
After install: Home Assistant at http://<tailscale-ip>:8123 works from
anywhere on the tailnet.
5f–5g. recordurbate and ubuncap — Manual¶
Prerequisite: secrets/tailscale/customer-app.sops.yaml committed
(encrypted) per secrets/tailscale/README.md and §5-pre-b.
These are customer-app VMs on whrrr's VMM. Use the same customer-app key on
both hosts. Run the customer-app one-liner in secrets/tailscale/README.md
per host (from a machine with the repo and sops), or decrypt locally and use
sudo tailscale up --authkey=… --advertise-tags=tag:customer-app --accept-routes
on each VM.
Post-deployment: accept routes on your devices¶
After infra-services is advertising 192.168.6.0/24, enable route
acceptance on each client device you use:
- macOS/Linux:
tailscale up --accept-routesor toggle in the Tailscale menu bar app - iOS/Android: Tailscale app > Settings > "Use Tailscale subnets"
- Windows: Tailscale system tray > "Use subnet routes"
6. Create Tailscale API Key + GitHub Secrets (SEC-007)¶
Where: Tailscale Admin Console + GitHub repo settings
Steps¶
- Go to https://login.tailscale.com/admin/settings/keys
- Create a new API key
- Note your tailnet name (visible at the top of the admin console)
- Go to your GitHub repo > Settings > Secrets and variables > Actions
- Add two secrets:
TS_API_KEY— paste the API keyTS_TAILNET— paste the tailnet name (e.g.,yourtailnet.ts.net)
Verify¶
- Push any change to
infra/tailscale/acl.jsonand merge tomain - The
Tailscale ACL Syncworkflow should run and succeed - Check Tailscale admin > Access Controls — it should match
acl.json
Reference: infra/tailscale/README.md
7. Deploy AdGuard + Unbound on infra-services¶
Where: SSH to infra-services
Steps¶
- Pull the latest code:
- Add a DNS record for
adguard.infra.realemail.appon the UDM: -
Settings > Policy Table > add
adguard.infra.realemail.app→192.168.6.17 -
Start the stack:
- Open
https://adguard.infra.realemail.appin your browser for initial setup (not:3000— homepage uses that port on the host): - Set admin username and password
- Set the listen interface to
0.0.0.0:53for DNS -
Set upstream DNS to
udp://unboundwith bootstrap127.0.0.11(see AdGuard upstream (repo README)) -
Import DNS rewrites from inventory (optional but recommended):
cd /opt/homelab
# The file is already generated at services/adguard/dns-rewrites.yaml
# Add them in the AdGuard UI: Filters > DNS rewrites
# Or script it with curl (see services/adguard/README.md)
Verify¶
dig @192.168.6.17 google.comshould return an A recorddig @192.168.6.17 infra-services.lab.localshould return192.168.6.17(if you imported the rewrites)https://adguard.infra.realemail.appshould load the AdGuard dashboard
Reference: services/adguard/README.md
8. Cut Over DNS from PiHole to AdGuard¶
Where: UDM SE > Settings > Networks + Settings > Internet
Prerequisite: AdGuard running and verified (step 7). infra-services must
have Tailscale prefer-main routing for 192.168.6.0/24 (Ansible tailscale
role — see AdGuard — Servers VLAN caveat).
Steps¶
- Internet → WAN1 → DNS (IPv4 only for now):
- Manual DNS
192.168.6.17 - Clear any legacy PiHole IPv4 (
192.168.6.80) and IPv6 entries -
IPv6 DNS: leave blank until AdGuard publishes v6 (optional later)
-
For each VLAN network on the UDM:
- Settings > Networks > click the network
- Under DHCP > DNS Server →
192.168.6.17 - Save
Servers VLAN (4): can use 192.168.6.17 directly now that the
Tailscale routing fix is in place. Gateway-as-DNS (.1) also works if WAN
upstream points at AdGuard.
- Renew DHCP on clients (or reboot). On Ubuntu hosts, avoid one-off
resolvectl dns— use netplandhcp4-overrides: use-dns: falseonly if you need static overrides.
Verify¶
dig @192.168.6.17 google.comfrom a Servers VLAN host (e.g. saltierpoop)dig google.comon that host (systemd-resolved / DHCP DNS)- Browse the web; check AdGuard dashboard for query activity
dig @192.168.6.17 infra-services.lab.local→192.168.6.17
Parallel run¶
Keep PiHole (LXC 104) running for 48 hours after cutover. If anything
breaks, revert DHCP DNS to 192.168.6.80 and WAN DNS to PiHole on the UDM.
9. Decommission PiHole LXC 104 (blocktopus)¶
Where: Proxmox UI or CLI
Prerequisite: 48-hour parallel run with AdGuard (step 8) verified.
Steps¶
- Verify no clients are still using PiHole:
- SSH to PiHole (
192.168.6.80) and check query logs -
If queries are still coming in, something still points at the old DNS
-
Stop the LXC:
- Wait 24-48 hours. If nothing breaks, destroy it:
- Let me know when done so I can update inventory and docs.