Skip to content

Coordinated OS Patching

Phase 8 delivers the original homelab goal: one command-and-control host (infra-services) applies Linux package updates to all Ansible-managed hosts on a defined schedule.

Why separate from ansible-pull?

Mechanism Purpose Cadence
ansible-pull Git config convergence (roles, secrets, timers) Every 30 min
homelab-patch-orchestrate OS package upgrades (apt) Weekly (Sun 04:00 PT)

Mixing apt upgrade into every pull would be noisy, hard to schedule around media workloads, and risky during incident response. Patching is deliberate and wave-ordered.

Architecture

flowchart LR
  IS[infra-services C&C]
  SP[saltierpoop wave0]
  PX[prox wave1]
  IS2[infra-services wave2]

  IS -->|push SSH| SP
  IS -->|push SSH root| PX
  IS -->|local| IS2

Components

Piece Location
roles/patch-controller/ infra-services only — SSH key, systemd timer
roles/patching/ All patching_targets — apt, reboot, metrics
playbooks/patch.yml Invoked by timer; three wave plays
homelab-patch-orchestrate.timer Weekly schedule on infra-services
patch.prom node_exporter textfile on each target
patch_orchestrate.prom infra-services — orchestrator exit code + last run time
ansible_pull.prom each host — last successful ansible-pull apply
/etc/homelab/patching-notify.env Discord + ntfy vars for wrapper and playbook

Inventory groups

Group Members (today)
patch_controller infra-services
patching_targets infra-services, prox, saltierpoop
patching_wave0 saltierpoop
patching_wave1 prox
patching_wave2 infra-services (C&C patches itself last)

New Ansible-managed Linux hosts join patching_targets and a wave group in inventory; re-run render-ansible.py.

Policy defaults

Defined in infra/ansible/inventory/group_vars/patching_targets.yml:

  • patching_upgrade_type: securityapt upgrade safe (installed packages only)
  • patching_reboot: auto_if_required — reboot when /var/run/reboot-required exists
  • unattended-upgrades package removed on targets

Override per host in host_vars/ (e.g. patching_upgrade_type: dist before an LTS bump).

SSH access

Target User Notes
saltierpoop someone NOPASSWD sudo via common role
prox root host_vars/prox.yml
infra-services (wave 2) local ansible_connection: local

The patch-controller generates /etc/homelab/patch-controller/id_ed25519 on infra-services. site.yml distributes the public key to other targets when ansible-pull converges on infra-services.

Maintenance hold

/etc/homelab/MAINTENANCE on a host:

  • Skips ansible-pull (existing behavior)
  • Skips homelab-patch-orchestrate service (ExecCondition)
  • Skips that host inside roles/patching/ (meta: end_host)

Observability and notifications

flowchart TB
  subgraph orchestrator [infra-services]
    Timer[homelab-patch-orchestrate.timer]
    Wrapper[homelab-patch-orchestrate.sh]
    PB[patch.yml]
    Timer --> Wrapper --> PB
  end
  subgraph targets [patching_targets]
    SP[saltierpoop]
    PX[prox]
    IS[infra-services wave2]
  end
  PB --> SP
  PB --> PX
  PB --> IS
  Wrapper -->|failure| NtfyF[ntfy critical]
  Wrapper -->|failure| DiscordF[Discord]
  PB -->|success summary| DiscordS[Discord]
  SP -->|pre-reboot| NtfyR[ntfy critical]
  PX -->|pre-reboot| NtfyR
  IS -->|pre-reboot| NtfyR
  SP --> Prom[patch.prom]
  PX --> Prom
  IS --> Prom
  Wrapper --> OrchProm[patch_orchestrate.prom]
  Prom --> Alertmanager
  OrchProm --> Alertmanager
  Alertmanager --> DiscordA[Discord all severities]
  Alertmanager -->|critical patching| NtfyA[ntfy]
Signal Where
ARA Every orchestrator and playbook run
Prometheus homelab_patch_last_success_unixtime, homelab_patch_reboot_required, homelab_patch_orchestrate_last_exit_code
Grafana monitoring/grafana/dashboards/patching.json
Discord Rich embeds + link buttons (success, failure, pre-reboot, validation); Alertmanager markdown with runbook links
ntfy Orchestrator failure + pre-reboot (Ansible/wrapper); critical patch alerts (Alertmanager)

Alerts (monitoring/prometheus/alerts/patching.yml):

Alert Severity When
OSPatchStale warning patch.prom older than 8 days
OSPatchOrchestrateFailed critical orchestrator exit code ≠ 0
PatchRebootPending warning homelab_patch_reboot_required == 1 for 30m

Secrets:

Topic: homelab-patch-critical-b4e9

Out of scope

  • Saltbox containers — OS layer only on saltierpoop; use Saltbox/Komodo for image updates
  • Proxmox guests not in inventory — patch when onboarded to Ansible
  • Appliances (UDM, Synology DSM, HAOS) — vendor UI / firmware
  • customer-app class — own lifecycle