Longhorn Backup Policy¶
Longhorn provides replicated block storage across the three bare-metal nodes. Snapshot scheduling and off-site backup to MinIO are enforced cluster-wide by a labelling script.
Label Groups¶
longhorn-backup-labels.sh runs as part of the weekly maintenance flow and ensures every PVC has the correct labels. Longhorn maps labels to recurring job groups.
| Label | Applied to | Effect |
|---|---|---|
recurring-job.longhorn.io/source=enabled |
All PVCs | Opts the PVC into the recurring job system |
recurring-job-group.longhorn.io/default=enabled |
All PVCs | Daily local snapshots |
recurring-job-group.longhorn.io/weekly=enabled |
Selected namespaces | Weekly backup to MinIO S3 |
recurring-job-group.longhorn.io/nosnapshots=enabled |
Cache / metrics PVCs | No snapshots (prometheus, redis, elasticsearch, ...) |
Namespaces in the weekly group: appdaemon, bootstrap, envuassu, esphome, frigate, home-assistant, openldap, and others where data loss would be significant.
Backup Target¶
Backups ship to MinIO at minio.home.tillo.ch:30000. The backup target URL is configured in Longhorn settings as s3://longhorn-backups@mdapi/.
VM Backups¶
Longhorn also backs up KubeVirt VMs (e.g. the CipherTrust Manager appliance) as standard Longhorn volumes. This is what makes the CipherTrust Manager recoverable — a Longhorn snapshot restore brings the entire VM disk back without needing to reconfigure Akeyless.
Monitoring Thresholds¶
The weekly_infra_health Windmill flow monitors backup ages:
| Job group | Expected cadence | Alert threshold |
|---|---|---|
default (daily) |
Daily | 35 days without snapshot |
weekly (MinIO) |
Weekly | 70 days without backup |
Thresholds are intentionally wider than the cadence to absorb missed runs without false-positive alerts.
PV Reclaim Policy¶
All production PVs must use reclaimPolicy: Retain. If a PVC is deleted accidentally, Retain leaves the PV (and its Longhorn volume) intact for manual recovery.
Default is Delete
Longhorn PVs are provisioned with Delete by default. Any PVC that could contain irreplaceable data should have its PV reclaim policy patched immediately after provisioning:
The pv_reclaim_policy_analysis Windmill flow audits this weekly.