Skip to content

Proxmox Storage — Live Snapshot

Snapshot captured: 2026-06-24 (read-only SSH audit of prox / 192.168.6.71) Host: prox — ASUS PN64, 1 TB NVMe Context: Post–Wave A/B decommission; metrimon vzdump failed twice on local

Point-in-time snapshot

This document is a frozen audit of storage at capture time. It is not regenerated from inventory. After intentional changes (ISO prune, CIFS fix, guest resize, Synology cleanup), capture a new snapshot or update the compute live index with a dated successor file.

Collection method: ssh infra-services-cursor → patch-controller key → root@192.168.6.71pvesm status, df, vgs/lvs, qm list, pct list, pct config, /var/lib/vz sizing, mount write tests.

Related: Synology capacity ntfy · Compute disposition review · Prox storage remediation proposal · prox-2026-06-23.json


Post-snapshot update (2026-06-24)

Backup target replaced. Owner created NFS share infra-backups on Whrrr volume6; agent configured Proxmox and validated end-to-end.

Item Value
Storage ID infra-backups
Export 192.168.6.215:/volume6/infra-backups
Mount /mnt/pve/infra-backups
NFS rule 192.168.6.71 Read/Write, squash Map root to admin
Quota (DSM) 2 TB visible to PVE
Retention prune-backups keep-last=3
Retired vm-backups CIFS on Prawns (pvesm remove)

Validation: vzdump 120 --storage infra-backups succeeded (octoprint LXC, 479 MB archive on NFS).

Relocated (2026-06-24): ~2.7 GB legacy Wave A vzdump (103 unmanic, 117 caddy, 122 netboot) plus metrimon failure log and 2024 saltierpoop log moved from prox local to infra-backups/dump/ (~3.1 GB total on NFS). ISO prune unchanged.


Executive summary

Pressure Severity Headline
pve-root / vzdump scratch Critical 24 GB free on 96 GB root; large vzdump to local needs more scratch than available
vm-backups CIFS Critical Mounted but not writable — PVE marks storage inactive
Synology Prawns ~99% High ~130 GB free on 30 TB; caps NFS + backup offload
local-lvm thin pool Moderate 67.5% used (~562 GB / 794 GB); room after decom waves
LXC 114 nfs-monitoring High (guest) 91% full inside (96 GB / 111 GB)

Two separate pools matter: local (pve-root) holds ISOs and vzdump files; local-lvm holds guest disks. Freeing thin-pool space does not fix root pressure during backup.


Physical and logical layout

flowchart TB
  subgraph nvme["NVMe ~930 GB (VG pve)"]
    root["pve-root 96 GB<br/>local — ISO, vzdump, vztmpl<br/>66 GB used · 24 GB free"]
    swap["swap 8 GB"]
    thin["local-lvm thin pool 794 GB<br/>67.5% data used<br/>~562 GB used · ~271 GB free"]
    vgfree["VG unallocated ~8 GB"]
  end

  subgraph nas["Synology Whrrr 192.168.6.215 — Prawns ~30 TB"]
    prawns["99.6% full · ~130 GB free"]
    cifs["CIFS //Prawns/backups/proxbox<br/>vm-backups — WRITE FAIL"]
    nfs["NFS /volume9/Prawns<br/>synorpn — LXC bind mounts"]
  end

  root -->|"vzdump default when CIFS broken"| dump["/var/lib/vz/dump 2.7 GB"]
  root --> iso["/var/lib/vz/template/iso 12 GB"]
  root --> cache["/var/lib/vz/template/cache 2.3 GB"]
  thin --> guests["12 guests — see §4"]
  nfs --> lxc114["LXC 114 mp0/mp1"]
  cifs -.->|"Permission denied"| root

pve-root breakdown

Mount / path Size Role
/ (pve-root) 96 GB total Proxmox OS + local storage
/var/lib/vz/template/iso 12 GB VM install ISOs
/var/lib/vz/template/cache 2.3 GB LXC templates
/var/lib/vz/dump 2.7 GB vzdump archives (local backup target)
Other /var, /usr, … ~49 GB packages, logs, PVE state

Proxmox storage targets

Storage Type PVE status Total Used Avail Content Notes
local dir active 94 GB 65 GB 23 GB iso, backup, vztmpl Backs onto pve-root
local-lvm lvmthin active 795 GB 537 GB 264 GB images, rootdir data thin pool
synorpn nfs active 30 TB 29.9 TB ~130 GB rootdir 192.168.6.215:/volume9/Prawns
vm-backups cifs inactive 30 TB 29.9 TB ~130 GB backup touch → Permission denied

storage.cfg excerpt (paths only):

  • local: /var/lib/vz
  • local-lvm: thin pool data, VG pve
  • synorpn: export /volume9/Prawns/mnt/pve/synorpn
  • vm-backups: //192.168.6.215/Prawns subdir /backups/proxbox, user proxbox

Why vzdump to local failed (metrimon VM 106)

flowchart TD
  A["vzdump requested"] --> B{"--storage?"}
  B -->|"vm-backups"| C["CIFS: Permission denied"]
  B -->|"local (default/fallback)"| D["Stream to /var/lib/vz/dump on pve-root"]
  C --> D
  D --> E{"Free space on 96 GB root?"}
  E -->|"Large VM e.g. 96 GB disk"| F["Fails ~77% — broken pipe / disk full"]
  E -->|"Small LXC"| G["Succeeds — e.g. 240 MB–1.5 GB archives"]

Metrimon (96 GB provisioned) failed twice at ~77% of read with ~24 GB free on root. Compressed archive would have been smaller, but the writer needs substantial temporary headroom during the job.


ISO and template inventory (local)

ISOs — 12 GB total

File Size Candidate to remove?
ubuntu-22.04.2-desktop-amd64.iso 4.6 GB Yes (desktop; saltierpoop uses server ISO)
ubuntu-24.04.1-live-server-amd64.iso 2.6 GB Keep one server ISO
ubuntu-22.04.4-live-server-amd64.iso 2.0 GB Dedupe vs 24.04
ubuntu-22.04.2-live-server-amd64.iso 1.9 GB Dedupe vs 24.04
alpine-standard-3.21.0-x86_64.iso 241 MB Keep if needed for tiny installs

Quick win: remove redundant Ubuntu ISOs → ~10 GB back on root.

CT templates — 2.3 GB

Largest: noble-server-cloudimg-amd64.img (601 MB), TurnKey/Debian/Ubuntu tarballs. Prune unused templates after confirming no planned LXC creates.

Surviving vzdump on local — 2.7 GB

Guest (retired) Archive Size
caddy (117) vzdump-lxc-117-2026_06_23-18_47_55.tar.zst 1.5 GB
unmanic (103) vzdump-lxc-103-2026_06_23-18_37_39.tar.zst 976 MB
netboot.xyz (122) vzdump-lxc-122-2026_06_23-18_32_33.tar.zst 240 MB

Note (2026-06-24): Three other Wave A dumps (k6, aiproject, dnsproject) were removed from prox during a failed metrimon backup retry — see journal incident. Paths in inventory mark those artifacts as deleted.


Thin pool (local-lvm) — provisioned vs actual

xychart-beta
    title "Top guests by ~actual thin usage (GB, estimated)"
    x-axis ["100 saltierpoop", "114 nfs-mon", "119 harbor", "110 sqlserver", "115 ollama", "200 haos", "123 infra", "109 graylog", "111 influx", "113 mysql", "116 pulse"]
    y-axis "GB (approx)" 0 --> 220
    bar [207, 110, 70, 40, 29, 30, 19, 14, 7, 2, 4]

Estimates: provisioned_GB × LVM_data_percent from lvs at capture time.

LV Guest Prov. Data % ~Actual
vm-100-disk-0 saltierpoop 260 GB 79.7% ~207 GB
vm-114-disk-0 nfs-monitoring 112 GB 98.7% ~110 GB
vm-119-disk-0 harbor-registry 80 GB 87.8% ~70 GB
vm-110-disk-0 sqlserver2022 60 GB 66.7% ~40 GB
vm-115-disk-0 ollama 35 GB 82.7% ~29 GB
vm-200-disk-1 haos 32 GB 95.3% ~30 GB
vm-123-disk-1 infra-services 30 GB 65.0% ~19 GB
vm-109-disk-0 graylog 30 GB 48.2% ~14 GB
vm-111-disk-0 influxdb 8 GB 86.7% ~7 GB
vm-113-disk-0 mysql 8 GB 25.2% ~2 GB
vm-116-disk-0 pulse 4 GB 99.5%* ~4 GB

* Pulse LVM thin metadata high; in-guest df showed 59% used (2.2 GB / 3.9 GB).

Pool totals: 794 GB thin, 67.5% data used, 2.22% metadata — ~271 GB thin free for growth.


Guests at snapshot (12 on prox)

Post–Wave A/B decommission. Destroyed since prior baseline: 102, 103, 105, 106, 107, 117, 118, 121, 122.

VMs

VMID Name State RAM Disk prov. Storage Notes
100 saltierpoop running 30 GB 260 GB local-lvm Largest consumer; virtio-scsi, discard=on
123 infra-services running 8 GB 30 GB local-lvm Homelab control plane
200 haos running 6 GB 32 GB local-lvm Thin 95%; review HA retention

LXCs

VMID Name State RAM Disk prov. In-guest / Mounts Notes
109 graylog stopped 8 GB 30 GB Pattern E revive candidate
110 sqlserver2022 stopped 23 GB 60 GB Owner keep
111 influxdb running 2 GB 8 GB Retire after Influx cutover
113 mysql stopped 1 GB 8 GB Owner keep
114 nfs-monitoring running 8 GB 112 GB 96G/111G (91%) synorpn, prawns NFS ES + Docker; hottest guest
115 ollama stopped 10 GB 35 GB Phase 9 keep
116 pulse running 1 GB 4 GB 2.2G/3.9G Pulse Monitoring Server
119 harbor-registry running 4 GB 80 GB 45G/79G (61%) Harbor v2.14 stack
120 octoprint stopped 1 GB 4 GB Owner keep

LXC 114 mount detail (capture)

rootfs: local-lvm:vm-114-disk-0,size=112G
mp0:   /mnt/pve/synorpn,mp=/mnt/synorpn
mp1:   /mnt/prawns,mp=/mnt/nfs-prawns
nameserver: 192.168.6.17  (fixed 2026-06-24; was Tailscale DNS)

Inside 114 at capture: /mnt/synorpn on Whrrr NFS 100% (30T volume); rootfs 91% full (Elasticsearch + Docker stack).


Relief options (prioritized)

flowchart LR
  subgraph immediate["Immediate — root / backups"]
    A1["Fix vm-backups CIFS ACLs"]
    A2["Prune ISOs ~10 GB"]
    A3["Relocate vzdump to infra-backups ✓"]
  end
  subgraph upstream["Upstream — NAS"]
    B1["Synology Prawns cleanup"]
    B2["Expand or tier cold data"]
  end
  subgraph guests["Guests — thin pool"]
    C1["114 ES/Docker cleanup"]
    C2["100 saltierpoop retention"]
    C3["200 HAOS history"]
  end
  A1 --> B1
  A3 --> B1
  B1 --> C1
Priority Action Impact Owner gate
1 Fix vm-backups write (Synology creds / ACL / PVE storage test) Unblocks all future vzdump off root CIFS password / DSM share ACL
2 Prune redundant ISOs on prox ~10 GB on pve-root Confirm which ISOs to keep
3 Synology Prawns capacity NFS + backup headroom capacity runbook
4 Move surviving vzdump to NAS (after #1) ~2.7 GB on root Approve destination path
5 114 in-guest cleanup (ES indices, Docker) Guest + thin pressure Ops — no destroy
6 Set vm-backups as default backup storage in PVE Prevents repeat After #1 verified

Regenerate

No automated script yet. To refresh this snapshot:

# From operator workstation (Cursor: infra-services-cursor → prox)
ssh infra-services-cursor "sudo ssh -i /etc/homelab/patch-controller/id_ed25519 root@192.168.6.71 \
  'pvesm status; df -hT; vgs; lvs -o+data_percent; du -sh /var/lib/vz/*'"

Copy output into a new dated file prox-storage-YYYY-MM-DD.md and link from compute live index.


Changelog

Date Change
2026-06-24 Initial snapshot after consolidation Wave A/B and metrimon vzdump failure
2026-06-24 infra-backups NFS on volume6 live; vm-backups CIFS removed; test vzdump 120 OK
2026-06-24 Legacy ~2.7 GB vzdump relocated from prox local to infra-backups