GitOps with Fleet¶
Design: Git as the Source of Truth¶
Every resource in the cluster — Helm releases, raw manifests, RBAC, ingress rules, ExternalSecrets — is declared in the mdapi/fleet Git repository at https://gitlab.mdapi.ch/mdapi/fleet (public mirror). No kubectl apply is ever run manually. This makes the cluster state fully auditable, reproducible, and diff-able.
Rancher Fleet watches the repo and continuously reconciles the cluster state to match.
Pipeline¶
flowchart TD
push["git push\n(main / test)"]
subgraph gitlab["GitLab — mdapi/fleet"]
main_br["main branch"]
test_br["test branch"]
end
subgraph rancher_mgmt["mdapi-rancher (management cluster)"]
fleet_ctrl["Fleet Controller"]
bd_prod["BundleDeployments\n(prod namespace)"]
bd_test["BundleDeployments\n(test namespace)"]
end
prod["mdapi-prod\nHelm releases + raw manifests"]
test_cl["mdapi-test (Rackspace)"]
push --> gitlab
main_br --> fleet_ctrl
test_br --> fleet_ctrl
fleet_ctrl --> bd_prod --> prod
fleet_ctrl --> bd_test --> test_cl
Repository Structure¶
fleet/
├── bootstrap/ # GitLab (Helm)
├── windmill/ # Windmill (Helm)
├── keycloak/ # Keycloak
├── joplin/ # Joplin + MCP server
├── nameserver/ # BIND9 + Webmin
├── tv/ # Media stack
├── longhorn/ # Longhorn config
├── keel/ # Keel (Helm)
├── ... # 35+ more namespaces
└── README.md
Helm History and ErrApplied¶
Fleet validates that the Helm release history is intact before each reconcile. When Kubernetes GC removes old history secrets, Fleet enters ErrApplied.
Prevention: every fleet.yaml sets helm.maxHistory: 25. The key is maxHistory, not historyMax — the wrong key is silently ignored.
Recovery: clear status.release on the BundleDeployment to force a fresh helm install:
kubectl --context mdapi-rancher -n <cluster-ns> patch bundledeployment <name> \
--type=merge --subresource=status \
-p '{"status":{"release":""}}'
Keel — Automated Image Updates¶
Keel polls container registries every 4 hours and compares image digests. If a digest has changed, it patches the Deployment and triggers a rolling update.
flowchart LR
reg["Container Registry\ndocker.io / ghcr.io\nregistry.mdapi.ch"]
keel["Keel\ndigest poll @every 4h"]
deploy["Deployment"]
pod["Rolling update"]
reg -->|"digest changed?"| keel -->|"patch image tag"| deploy --> pod
Helm provider requires explicit pollSchedule
The global polling.defaultSchedule only applies to the Kubernetes annotations provider. For Helm-managed releases, pollSchedule must be declared in the release's own values:
Reloader — Config-Change Rollouts¶
Stakater Reloader is the companion to Keel: where Keel rolls a workload when its image digest changes, Reloader rolls it when a ConfigMap or Secret it mounts changes. It is opt-in — only workloads annotated reloader.stakater.com/auto: "true" are watched — and covers the hand-managed config maps Fleet delivers that would otherwise need a manual kubectl rollout restart to take effect.
Reloader rollouts show as Modified in Fleet
When Reloader triggers a rollout it patches the live workload, so Fleet briefly reports that bundle as Modified. Fleet correctDrift is off, so it only reports the drift — it does not fight Reloader.
cert-manager and DNS-01¶
cert-manager uses RFC 2136 dynamic updates to add _acme-challenge TXT records to BIND9. The target nameserver is 31.3.128.59:53 (external IP of ns.mdapi.ch, configured in each namespace's letsencrypt-prod Issuer).
The mdapi.ch zone requires allow-update { key "mdapi"; } in BIND9's named.conf. This is configured at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/nameserver (public mirror) and applied via Fleet.
mdapi.ch uses offline DNSSEC signing managed by Webmin. Dynamic updates land in a zone journal without DNSSEC signatures. This is acceptable — Let's Encrypt verifies with a non-validating resolver.
external-dns and internal service discovery¶
The same RFC 2136 protocol is reused to keep the internal home.tillo.ch zone in sync with cluster state — but against a different nameserver, with a different TSIG key, and a much narrower update policy.
external-dns watches Service and Ingress objects and dynamically publishes A / AAAA / CNAME records into Technitium (the authoritative server for home.tillo.ch, see Hardware → DNS & DHCP Architecture). A dedicated TSIG key is registered in Technitium with an update policy scoped to the home.tillo.ch zone apex plus *.home.tillo.ch — the key cannot touch any other zone the server hosts.
The controller is opt-in by annotation: only resources explicitly tagged are published, so existing manual / DHCP records keep working while the system is burned in.
metadata:
annotations:
external-dns.alpha.kubernetes.io/manage: mdapi
external-dns.alpha.kubernetes.io/target: 192.168.1.191 # optional VIP pin
external-dns.alpha.kubernetes.io/ttl: "300"
The controller runs with policy=upsert-only (create + update, never delete) and a txt-prefix=extdns- registry so ownership is recorded on a sibling TXT record. Once the manual records are gone, it can be flipped to full sync.
RFC 2136 mode needs AXFR enabled
external-dns must read the current zone to know which records already exist. With --rfc2136-tsig-axfr it pulls the zone over a TSIG-signed AXFR; without it, it cannot see existing records and re-issues an ADD for every managed record on each reconcile, never converging.
Non-Fleet Resources¶
A small number of resources apply directly with kubectl apply (against mdapi-rancher) and are not managed by Fleet:
- The
mdapi/rancher-localrepo — ovpn-admin UI formbptillo, plus themdapi-ranchercluster's own cluster-issuer. Committed for traceability but not auto-reconciled.
The bootstrap namespace (GitLab) on mdapi-prod is bootstrapped by Helm directly, not by Fleet, but every adjacent resource that lives alongside it — docs.mdapi.ch ingress, certificate, the namespace's letsencrypt-prod issuer — is in the fleet/docs/ bundle at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/docs (public mirror), reconciled by the mdapi-prod GitRepo.