Skip to content

GitOps with Fleet

Design: Git as the Source of Truth

Every resource in the cluster — Helm releases, raw manifests, RBAC, ingress rules, ExternalSecrets — is declared in the mdapi/fleet Git repository at https://gitlab.mdapi.ch/mdapi/fleet (public mirror). No kubectl apply is ever run manually. This makes the cluster state fully auditable, reproducible, and diff-able.

Rancher Fleet watches the repo and continuously reconciles the cluster state to match.

Pipeline

flowchart TD
    push["git push\n(main / test)"]

    subgraph gitlab["GitLab — mdapi/fleet"]
        main_br["main branch"]
        test_br["test branch"]
    end

    subgraph rancher_mgmt["mdapi-rancher (management cluster)"]
        fleet_ctrl["Fleet Controller"]
        bd_prod["BundleDeployments\n(prod namespace)"]
        bd_test["BundleDeployments\n(test namespace)"]
    end

    prod["mdapi-prod\nHelm releases + raw manifests"]
    test_cl["mdapi-test (Rackspace)"]

    push --> gitlab
    main_br --> fleet_ctrl
    test_br --> fleet_ctrl
    fleet_ctrl --> bd_prod --> prod
    fleet_ctrl --> bd_test --> test_cl

Repository Structure

fleet/
├── bootstrap/        # GitLab (Helm)
├── windmill/         # Windmill (Helm)
├── keycloak/         # Keycloak
├── joplin/           # Joplin + MCP server
├── nameserver/       # BIND9 + Webmin
├── tv/               # Media stack
├── longhorn/         # Longhorn config
├── keel/             # Keel (Helm)
├── ...               # 35+ more namespaces
└── README.md

Helm History and ErrApplied

Fleet validates that the Helm release history is intact before each reconcile. When Kubernetes GC removes old history secrets, Fleet enters ErrApplied.

Prevention: every fleet.yaml sets helm.maxHistory: 25. The key is maxHistory, not historyMax — the wrong key is silently ignored.

Recovery: clear status.release on the BundleDeployment to force a fresh helm install:

kubectl --context mdapi-rancher -n <cluster-ns> patch bundledeployment <name> \
  --type=merge --subresource=status \
  -p '{"status":{"release":""}}'

Keel — Automated Image Updates

Keel polls container registries every 4 hours and compares image digests. If a digest has changed, it patches the Deployment and triggers a rolling update.

flowchart LR
    reg["Container Registry\ndocker.io / ghcr.io\nregistry.mdapi.ch"]
    keel["Keel\ndigest poll @every 4h"]
    deploy["Deployment"]
    pod["Rolling update"]

    reg -->|"digest changed?"| keel -->|"patch image tag"| deploy --> pod

Helm provider requires explicit pollSchedule

The global polling.defaultSchedule only applies to the Kubernetes annotations provider. For Helm-managed releases, pollSchedule must be declared in the release's own values:

keel:
  trigger: poll
  pollSchedule: "@every 4h"
  images:
    - repository: image.repository
      tag: image.tag

Reloader — Config-Change Rollouts

Stakater Reloader is the companion to Keel: where Keel rolls a workload when its image digest changes, Reloader rolls it when a ConfigMap or Secret it mounts changes. It is opt-in — only workloads annotated reloader.stakater.com/auto: "true" are watched — and covers the hand-managed config maps Fleet delivers that would otherwise need a manual kubectl rollout restart to take effect.

Reloader rollouts show as Modified in Fleet

When Reloader triggers a rollout it patches the live workload, so Fleet briefly reports that bundle as Modified. Fleet correctDrift is off, so it only reports the drift — it does not fight Reloader.

cert-manager and DNS-01

cert-manager uses RFC 2136 dynamic updates to add _acme-challenge TXT records to BIND9. The target nameserver is 31.3.128.59:53 (external IP of ns.mdapi.ch, configured in each namespace's letsencrypt-prod Issuer).

The mdapi.ch zone requires allow-update { key "mdapi"; } in BIND9's named.conf. This is configured at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/nameserver (public mirror) and applied via Fleet.

mdapi.ch uses offline DNSSEC signing managed by Webmin. Dynamic updates land in a zone journal without DNSSEC signatures. This is acceptable — Let's Encrypt verifies with a non-validating resolver.

external-dns and internal service discovery

The same RFC 2136 protocol is reused to keep the internal home.tillo.ch zone in sync with cluster state — but against a different nameserver, with a different TSIG key, and a much narrower update policy.

external-dns watches Service and Ingress objects and dynamically publishes A / AAAA / CNAME records into Technitium (the authoritative server for home.tillo.ch, see Hardware → DNS & DHCP Architecture). A dedicated TSIG key is registered in Technitium with an update policy scoped to the home.tillo.ch zone apex plus *.home.tillo.ch — the key cannot touch any other zone the server hosts.

The controller is opt-in by annotation: only resources explicitly tagged are published, so existing manual / DHCP records keep working while the system is burned in.

metadata:
  annotations:
    external-dns.alpha.kubernetes.io/manage: mdapi
    external-dns.alpha.kubernetes.io/target: 192.168.1.191    # optional VIP pin
    external-dns.alpha.kubernetes.io/ttl: "300"

The controller runs with policy=upsert-only (create + update, never delete) and a txt-prefix=extdns- registry so ownership is recorded on a sibling TXT record. Once the manual records are gone, it can be flipped to full sync.

RFC 2136 mode needs AXFR enabled

external-dns must read the current zone to know which records already exist. With --rfc2136-tsig-axfr it pulls the zone over a TSIG-signed AXFR; without it, it cannot see existing records and re-issues an ADD for every managed record on each reconcile, never converging.

Non-Fleet Resources

A small number of resources apply directly with kubectl apply (against mdapi-rancher) and are not managed by Fleet:

  • The mdapi/rancher-local repo — ovpn-admin UI for mbptillo, plus the mdapi-rancher cluster's own cluster-issuer. Committed for traceability but not auto-reconciled.

The bootstrap namespace (GitLab) on mdapi-prod is bootstrapped by Helm directly, not by Fleet, but every adjacent resource that lives alongside it — docs.mdapi.ch ingress, certificate, the namespace's letsencrypt-prod issuer — is in the fleet/docs/ bundle at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/docs (public mirror), reconciled by the mdapi-prod GitRepo.