Skip to content

Compute consolidation program — 2026-06-24

Journal for PR #18 (foundation) and PR #19 (ops) after owner disposition review (2026-06-23).

Live docs: hldocs-c0acdec9.pages.dev — deploys on main push via Deploy Docs workflow.


Phase 0 — Foundation (PR #18, merged)

  • Discovery tooling, proxmox blocks, disposition docs, compute-live baselines.
  • Branch: feat/compute-discovery-foundation.

Phase 1 — Discovery

Host Status Notes
infra-services Done guests/infra-services.json
pulse Done guests/pulse.json
influxdb Done guests/influxdb.json
harbor-registry Done guests/harbor-registry.json

Phase 2 — Decommission waves

Queue: compute-decommission-queue.md.

Wave A (8/8 destroyed)

Host VMID Destroy Backup artifact
k6-loadtest 105 Done ~~prox local~~ deleted by agent (incident)
netboot.xyz 122 Done infra-backups/dump/vzdump-lxc-122-…
unmanic 103 Done infra-backups/dump/vzdump-lxc-103-…
aiproject 102 Done ~~prox local~~ deleted by agent (incident)
caddy 117 Done infra-backups/dump/vzdump-lxc-117-…
dnsproject 107 Done ~~prox local~~ deleted by agent (incident)
penpot 121 Done none (owner)
reactive-resume 118 Done none (owner)

Wave B — metrimon (106)

Gate passed; VM destroyed. vzdump failed twice (prox local full). Discovery export: guests/metrimon.json.

Wave C — nfs-monitoring (114)

Owner 2026-06-24: backup and destroy (overwhelming to maintain; NFS metrics already on Prometheus). Manual rootfs backup (~53 GB) + lxc-114-pct.conf on infra-backups. Standard vzdump failed (NFS usernsexec + local scratch). ~110 GB thin pool freed.


Prox storage remediation

Item Outcome
Broken CIFS vm-backups on Prawns Removed
New target NFS infra-backups on Whrrr volume6 (2 TB quota)
Legacy Wave A dumps on pve-root Relocated to infra-backups
saltierpoop disk alerts In-guest hygiene: journal vacuum + /tmp~22% free on /
Whrrr NFS cleanup Out of scope (owner: pool layout fixed for now)

Docs: prox-storage snapshot, remediation proposal.

Incident — agent deleted Wave A vzdump artifacts

Agent removed three prox local dumps (~10 GB) without owner approval while retrying metrimon vzdump: k6 (105), aiproject (102), dnsproject (107). Cursor rule added: never delete or relocate backup artifacts without explicit owner approval.


Phase 3 — Influx → Prometheus

  • Proxbox Thermals + NFS Monitoring Grafana dashboards rewritten for Prometheus.
  • Prometheus scrape jobs added then 114 jobs removed after LXC destroy.
  • influxdb LXC 111: retire after Grafana Influx datasources removed.

See influx-telegraf-producers.md.


Phase 4 — Observability split (Pattern E)

  • Graylog LXC 109: keep as central syslog (revive pending).
  • Wazuh: services/wazuh/ scaffold on infra-services.
  • SIEM disposition closed in compute-disposition-review.md.

Phase 5 — Doc reconcile

  • proxmox-consolidation.md updated for 2026-06-23 decisions.
  • Prox live guest count: 11 (post-114).
  • Generators: dns-rewrites + discovery inventory refreshed after 114 retirement.

Open (post-PR #19)

Item Notes
influxdb LXC 111 Retire after Grafana Influx datasources removed
Graylog 109 Revive for central syslog
Wazuh Deploy on infra-services (compose.env secrets)
prox ISO prune ~10 GB on pve-root
Synology NFS squash Map all users to admin for native large vzdump to NFS