Proxmox Storage — Live Snapshot¶

Snapshot captured: 2026-06-24 (read-only SSH audit of prox / 192.168.6.71) Host: prox — ASUS PN64, 1 TB NVMe Context: Post–Wave A/B decommission; metrimon vzdump failed twice on local

Point-in-time snapshot

This document is a frozen audit of storage at capture time. It is not regenerated from inventory. After intentional changes (ISO prune, CIFS fix, guest resize, Synology cleanup), capture a new snapshot or update the compute live index with a dated successor file.

Collection method: ssh infra-services-cursor → patch-controller key → root@192.168.6.71 — pvesm status, df, vgs/lvs, qm list, pct list, pct config, /var/lib/vz sizing, mount write tests.

Post-snapshot update (2026-06-24)¶

Backup target replaced. Owner created NFS share infra-backups on Whrrr volume6; agent configured Proxmox and validated end-to-end.

Item	Value
Storage ID	`infra-backups`
Export	`192.168.6.215:/volume6/infra-backups`
Mount	`/mnt/pve/infra-backups`
NFS rule	`192.168.6.71` Read/Write, squash Map root to admin
Quota (DSM)	2 TB visible to PVE
Retention	`prune-backups keep-last=3`
Retired	`vm-backups` CIFS on Prawns (`pvesm remove`)

Validation: vzdump 120 --storage infra-backups succeeded (octoprint LXC, 479 MB archive on NFS).

Relocated (2026-06-24): ~2.7 GB legacy Wave A vzdump (103 unmanic, 117 caddy, 122 netboot) plus metrimon failure log and 2024 saltierpoop log moved from prox local to infra-backups/dump/ (~3.1 GB total on NFS). ISO prune unchanged.

Executive summary¶

Pressure	Severity	Headline
`pve-root` / vzdump scratch	Critical	24 GB free on 96 GB root; large `vzdump` to `local` needs more scratch than available
`vm-backups` CIFS	Critical	Mounted but not writable — PVE marks storage inactive
Synology Prawns ~99%	High	~130 GB free on 30 TB; caps NFS + backup offload
`local-lvm` thin pool	Moderate	67.5% used (~562 GB / 794 GB); room after decom waves
LXC 114 nfs-monitoring	High (guest)	91% full inside (96 GB / 111 GB)

Two separate pools matter: local (pve-root) holds ISOs and vzdump files; local-lvm holds guest disks. Freeing thin-pool space does not fix root pressure during backup.

Physical and logical layout¶

flowchart TB
  subgraph nvme["NVMe ~930 GB (VG pve)"]
    root["pve-root 96 GB<br/>local — ISO, vzdump, vztmpl<br/>66 GB used · 24 GB free"]
    swap["swap 8 GB"]
    thin["local-lvm thin pool 794 GB<br/>67.5% data used<br/>~562 GB used · ~271 GB free"]
    vgfree["VG unallocated ~8 GB"]
  end

  subgraph nas["Synology Whrrr 192.168.6.215 — Prawns ~30 TB"]
    prawns["99.6% full · ~130 GB free"]
    cifs["CIFS //Prawns/backups/proxbox<br/>vm-backups — WRITE FAIL"]
    nfs["NFS /volume9/Prawns<br/>synorpn — LXC bind mounts"]
  end

  root -->|"vzdump default when CIFS broken"| dump["/var/lib/vz/dump 2.7 GB"]
  root --> iso["/var/lib/vz/template/iso 12 GB"]
  root --> cache["/var/lib/vz/template/cache 2.3 GB"]
  thin --> guests["12 guests — see §4"]
  nfs --> lxc114["LXC 114 mp0/mp1"]
  cifs -.->|"Permission denied"| root

`pve-root` breakdown¶

Mount / path	Size	Role
`/` (`pve-root`)	96 GB total	Proxmox OS + `local` storage
`/var/lib/vz/template/iso`	12 GB	VM install ISOs
`/var/lib/vz/template/cache`	2.3 GB	LXC templates
`/var/lib/vz/dump`	2.7 GB	vzdump archives (`local` backup target)
Other `/var`, `/usr`, …	~49 GB	packages, logs, PVE state

Proxmox storage targets¶

Storage	Type	PVE status	Total	Used	Avail	Content	Notes
local	dir	active	94 GB	65 GB	23 GB	iso, backup, vztmpl	Backs onto `pve-root`
local-lvm	lvmthin	active	795 GB	537 GB	264 GB	images, rootdir	`data` thin pool
synorpn	nfs	active	30 TB	29.9 TB	~130 GB	rootdir	`192.168.6.215:/volume9/Prawns`
vm-backups	cifs	inactive	30 TB	29.9 TB	~130 GB	backup	`touch` → Permission denied

storage.cfg excerpt (paths only):

local: /var/lib/vz
local-lvm: thin pool data, VG pve
synorpn: export /volume9/Prawns → /mnt/pve/synorpn
vm-backups: //192.168.6.215/Prawns subdir /backups/proxbox, user proxbox

Why vzdump to `local` failed (metrimon VM 106)¶

flowchart TD
  A["vzdump requested"] --> B{"--storage?"}
  B -->|"vm-backups"| C["CIFS: Permission denied"]
  B -->|"local (default/fallback)"| D["Stream to /var/lib/vz/dump on pve-root"]
  C --> D
  D --> E{"Free space on 96 GB root?"}
  E -->|"Large VM e.g. 96 GB disk"| F["Fails ~77% — broken pipe / disk full"]
  E -->|"Small LXC"| G["Succeeds — e.g. 240 MB–1.5 GB archives"]

Metrimon (96 GB provisioned) failed twice at ~77% of read with ~24 GB free on root. Compressed archive would have been smaller, but the writer needs substantial temporary headroom during the job.

ISO and template inventory (`local`)¶

ISOs — 12 GB total¶

File	Size	Candidate to remove?
`ubuntu-22.04.2-desktop-amd64.iso`	4.6 GB	Yes (desktop; saltierpoop uses server ISO)
`ubuntu-24.04.1-live-server-amd64.iso`	2.6 GB	Keep one server ISO
`ubuntu-22.04.4-live-server-amd64.iso`	2.0 GB	Dedupe vs 24.04
`ubuntu-22.04.2-live-server-amd64.iso`	1.9 GB	Dedupe vs 24.04
`alpine-standard-3.21.0-x86_64.iso`	241 MB	Keep if needed for tiny installs

Quick win: remove redundant Ubuntu ISOs → ~10 GB back on root.

CT templates — 2.3 GB¶

Largest: noble-server-cloudimg-amd64.img (601 MB), TurnKey/Debian/Ubuntu tarballs. Prune unused templates after confirming no planned LXC creates.

Surviving vzdump on `local` — 2.7 GB¶

Guest (retired)	Archive	Size
caddy (117)	`vzdump-lxc-117-2026_06_23-18_47_55.tar.zst`	1.5 GB
unmanic (103)	`vzdump-lxc-103-2026_06_23-18_37_39.tar.zst`	976 MB
netboot.xyz (122)	`vzdump-lxc-122-2026_06_23-18_32_33.tar.zst`	240 MB

Note (2026-06-24): Three other Wave A dumps (k6, aiproject, dnsproject) were removed from prox during a failed metrimon backup retry — see journal incident. Paths in inventory mark those artifacts as deleted.

Thin pool (`local-lvm`) — provisioned vs actual¶

xychart-beta
    title "Top guests by ~actual thin usage (GB, estimated)"
    x-axis ["100 saltierpoop", "114 nfs-mon", "119 harbor", "110 sqlserver", "115 ollama", "200 haos", "123 infra", "109 graylog", "111 influx", "113 mysql", "116 pulse"]
    y-axis "GB (approx)" 0 --> 220
    bar [207, 110, 70, 40, 29, 30, 19, 14, 7, 2, 4]

Estimates: provisioned_GB × LVM_data_percent from lvs at capture time.

LV	Guest	Prov.	Data %	~Actual
vm-100-disk-0	saltierpoop	260 GB	79.7%	~207 GB
vm-114-disk-0	nfs-monitoring	112 GB	98.7%	~110 GB
vm-119-disk-0	harbor-registry	80 GB	87.8%	~70 GB
vm-110-disk-0	sqlserver2022	60 GB	66.7%	~40 GB
vm-115-disk-0	ollama	35 GB	82.7%	~29 GB
vm-200-disk-1	haos	32 GB	95.3%	~30 GB
vm-123-disk-1	infra-services	30 GB	65.0%	~19 GB
vm-109-disk-0	graylog	30 GB	48.2%	~14 GB
vm-111-disk-0	influxdb	8 GB	86.7%	~7 GB
vm-113-disk-0	mysql	8 GB	25.2%	~2 GB
vm-116-disk-0	pulse	4 GB	99.5%*	~4 GB

* Pulse LVM thin metadata high; in-guest df showed 59% used (2.2 GB / 3.9 GB).

Pool totals: 794 GB thin, 67.5% data used, 2.22% metadata — ~271 GB thin free for growth.

Guests at snapshot (12 on prox)¶

Post–Wave A/B decommission. Destroyed since prior baseline: 102, 103, 105, 106, 107, 117, 118, 121, 122.

VMs¶

VMID	Name	State	RAM	Disk prov.	Storage	Notes
100	saltierpoop	running	30 GB	260 GB	local-lvm	Largest consumer; virtio-scsi, discard=on
123	infra-services	running	8 GB	30 GB	local-lvm	Homelab control plane
200	haos	running	6 GB	32 GB	local-lvm	Thin 95%; review HA retention

LXCs¶

VMID	Name	State	RAM	Disk prov.	In-guest `/`	Mounts	Notes
109	graylog	stopped	8 GB	30 GB	—	—	Pattern E revive candidate
110	sqlserver2022	stopped	23 GB	60 GB	—	—	Owner keep
111	influxdb	running	2 GB	8 GB	—	—	Retire after Influx cutover
113	mysql	stopped	1 GB	8 GB	—	—	Owner keep
114	nfs-monitoring	running	8 GB	112 GB	96G/111G (91%)	synorpn, prawns NFS	ES + Docker; hottest guest
115	ollama	stopped	10 GB	35 GB	—	—	Phase 9 keep
116	pulse	running	1 GB	4 GB	2.2G/3.9G	—	Pulse Monitoring Server
119	harbor-registry	running	4 GB	80 GB	45G/79G (61%)	—	Harbor v2.14 stack
120	octoprint	stopped	1 GB	4 GB	—	—	Owner keep

LXC 114 mount detail (capture)¶

rootfs: local-lvm:vm-114-disk-0,size=112G
mp0:   /mnt/pve/synorpn,mp=/mnt/synorpn
mp1:   /mnt/prawns,mp=/mnt/nfs-prawns
nameserver: 192.168.6.17  (fixed 2026-06-24; was Tailscale DNS)

Inside 114 at capture: /mnt/synorpn on Whrrr NFS 100% (30T volume); rootfs 91% full (Elasticsearch + Docker stack).

Relief options (prioritized)¶

flowchart LR
  subgraph immediate["Immediate — root / backups"]
    A1["Fix vm-backups CIFS ACLs"]
    A2["Prune ISOs ~10 GB"]
    A3["Relocate vzdump to infra-backups ✓"]
  end
  subgraph upstream["Upstream — NAS"]
    B1["Synology Prawns cleanup"]
    B2["Expand or tier cold data"]
  end
  subgraph guests["Guests — thin pool"]
    C1["114 ES/Docker cleanup"]
    C2["100 saltierpoop retention"]
    C3["200 HAOS history"]
  end
  A1 --> B1
  A3 --> B1
  B1 --> C1

Priority	Action	Impact	Owner gate
1	Fix vm-backups write (Synology creds / ACL / PVE storage test)	Unblocks all future vzdump off root	CIFS password / DSM share ACL
2	Prune redundant ISOs on prox	~10 GB on `pve-root`	Confirm which ISOs to keep
3	Synology Prawns capacity	NFS + backup headroom	capacity runbook
4	Move surviving vzdump to NAS (after #1)	~2.7 GB on root	Approve destination path
5	114 in-guest cleanup (ES indices, Docker)	Guest + thin pressure	Ops — no destroy
6	Set vm-backups as default backup storage in PVE	Prevents repeat	After #1 verified

Regenerate¶

No automated script yet. To refresh this snapshot:

# From operator workstation (Cursor: infra-services-cursor → prox)
ssh infra-services-cursor "sudo ssh -i /etc/homelab/patch-controller/id_ed25519 root@192.168.6.71 \
  'pvesm status; df -hT; vgs; lvs -o+data_percent; du -sh /var/lib/vz/*'"

Copy output into a new dated file prox-storage-YYYY-MM-DD.md and link from compute live index.

Changelog¶

Date	Change
2026-06-24	Initial snapshot after consolidation Wave A/B and metrimon vzdump failure
2026-06-24	infra-backups NFS on volume6 live; vm-backups CIFS removed; test vzdump 120 OK
2026-06-24	Legacy ~2.7 GB vzdump relocated from prox `local` to infra-backups