Coordinated OS Patching¶
Phase 8 delivers the original homelab goal: one command-and-control host
(infra-services) applies Linux package updates to all Ansible-managed hosts on a
defined schedule.
Why separate from ansible-pull?¶
| Mechanism | Purpose | Cadence |
|---|---|---|
ansible-pull |
Git config convergence (roles, secrets, timers) | Every 30 min |
homelab-patch-orchestrate |
OS package upgrades (apt) |
Weekly (Sun 04:00 PT) |
Mixing apt upgrade into every pull would be noisy, hard to schedule around
media workloads, and risky during incident response. Patching is deliberate and
wave-ordered.
Architecture¶
flowchart LR
IS[infra-services C&C]
SP[saltierpoop wave0]
PX[prox wave1]
IS2[infra-services wave2]
IS -->|push SSH| SP
IS -->|push SSH root| PX
IS -->|local| IS2
Components¶
| Piece | Location |
|---|---|
roles/patch-controller/ |
infra-services only — SSH key, systemd timer |
roles/patching/ |
All patching_targets — apt, reboot, metrics |
playbooks/patch.yml |
Invoked by timer; three wave plays |
homelab-patch-orchestrate.timer |
Weekly schedule on infra-services |
patch.prom |
node_exporter textfile on each target |
patch_orchestrate.prom |
infra-services — orchestrator exit code + last run time |
ansible_pull.prom |
each host — last successful ansible-pull apply |
/etc/homelab/patching-notify.env |
Discord + ntfy vars for wrapper and playbook |
Inventory groups¶
| Group | Members (today) |
|---|---|
patch_controller |
infra-services |
patching_targets |
infra-services, prox, saltierpoop |
patching_wave0 |
saltierpoop |
patching_wave1 |
prox |
patching_wave2 |
infra-services (C&C patches itself last) |
New Ansible-managed Linux hosts join patching_targets and a wave group in
inventory; re-run render-ansible.py.
Policy defaults¶
Defined in infra/ansible/inventory/group_vars/patching_targets.yml:
patching_upgrade_type: security—apt upgradesafe (installed packages only)patching_reboot: auto_if_required— reboot when/var/run/reboot-requiredexistsunattended-upgradespackage removed on targets
Override per host in host_vars/ (e.g. patching_upgrade_type: dist before
an LTS bump).
SSH access¶
| Target | User | Notes |
|---|---|---|
| saltierpoop | someone |
NOPASSWD sudo via common role |
| prox | root |
host_vars/prox.yml |
| infra-services (wave 2) | local | ansible_connection: local |
The patch-controller generates /etc/homelab/patch-controller/id_ed25519 on
infra-services. site.yml distributes the public key to other targets when
ansible-pull converges on infra-services.
Maintenance hold¶
/etc/homelab/MAINTENANCE on a host:
- Skips
ansible-pull(existing behavior) - Skips
homelab-patch-orchestrateservice (ExecCondition) - Skips that host inside
roles/patching/(meta: end_host)
Observability and notifications¶
flowchart TB
subgraph orchestrator [infra-services]
Timer[homelab-patch-orchestrate.timer]
Wrapper[homelab-patch-orchestrate.sh]
PB[patch.yml]
Timer --> Wrapper --> PB
end
subgraph targets [patching_targets]
SP[saltierpoop]
PX[prox]
IS[infra-services wave2]
end
PB --> SP
PB --> PX
PB --> IS
Wrapper -->|failure| NtfyF[ntfy critical]
Wrapper -->|failure| DiscordF[Discord]
PB -->|success summary| DiscordS[Discord]
SP -->|pre-reboot| NtfyR[ntfy critical]
PX -->|pre-reboot| NtfyR
IS -->|pre-reboot| NtfyR
SP --> Prom[patch.prom]
PX --> Prom
IS --> Prom
Wrapper --> OrchProm[patch_orchestrate.prom]
Prom --> Alertmanager
OrchProm --> Alertmanager
Alertmanager --> DiscordA[Discord all severities]
Alertmanager -->|critical patching| NtfyA[ntfy]
| Signal | Where |
|---|---|
| ARA | Every orchestrator and playbook run |
| Prometheus | homelab_patch_last_success_unixtime, homelab_patch_reboot_required, homelab_patch_orchestrate_last_exit_code |
| Grafana | monitoring/grafana/dashboards/patching.json |
| Discord | Rich embeds + link buttons (success, failure, pre-reboot, validation); Alertmanager markdown with runbook links |
| ntfy | Orchestrator failure + pre-reboot (Ansible/wrapper); critical patch alerts (Alertmanager) |
Alerts (monitoring/prometheus/alerts/patching.yml):
| Alert | Severity | When |
|---|---|---|
OSPatchStale |
warning | patch.prom older than 8 days |
OSPatchOrchestrateFailed |
critical | orchestrator exit code ≠ 0 |
PatchRebootPending |
warning | homelab_patch_reboot_required == 1 for 30m |
Secrets:
- Ansible: notify.sops.yaml.example
- Alertmanager:
.env.sops.yaml.example(NTFY_PATCH_TOPIC)
Topic: homelab-patch-critical-b4e9
Out of scope¶
- Saltbox containers — OS layer only on
saltierpoop; use Saltbox/Komodo for image updates - Proxmox guests not in inventory — patch when onboarded to Ansible
- Appliances (UDM, Synology DSM, HAOS) — vendor UI / firmware
- customer-app class — own lifecycle