GitOps with Fleet¶

Design: Git as the Source of Truth¶

Every resource in the cluster — Helm releases, raw manifests, RBAC, ingress rules, ExternalSecrets — is declared in the mdapi/fleet Git repository. No kubectl apply is ever run manually. This makes the cluster state fully auditable, reproducible, and diff-able.

Rancher Fleet watches the repo and continuously reconciles the cluster state to match.

Pipeline¶

flowchart TD
    dev["git push\n(main / test / dev)"]

    subgraph gitlab["GitLab — mdapi/fleet"]
        main_br["main branch"]
        test_br["test branch"]
        dev_br["dev branch"]
    end

    subgraph rancher_mgmt["mdapi-rancher (management cluster)"]
        fleet_ctrl["Fleet Controller"]
        bd_prod["BundleDeployments\n(prod namespace)"]
        bd_test["BundleDeployments\n(test namespace)"]
    end

    prod["mdapi-prod\nHelm releases + raw manifests"]
    test_cl["mdapi-test (Rackspace)"]
    dev_cl["mdapi-dev (Rackspace)"]

    dev --> gitlab
    main_br --> fleet_ctrl
    test_br --> fleet_ctrl
    dev_br --> fleet_ctrl
    fleet_ctrl --> bd_prod --> prod
    fleet_ctrl --> bd_test --> test_cl
    fleet_ctrl --> dev_cl

Repository Structure¶

The repo is organized as one subdirectory per namespace. Each directory contains the Kubernetes manifests (raw YAML or Helm values) and a fleet.yaml or fleet.yml file that tells Fleet how to deploy them.

fleet/
├── bootstrap/        # GitLab (Helm)
├── windmill/         # Windmill (Helm)
├── keycloak/         # Keycloak
├── joplin/           # Joplin + MCP server
├── mail/             # Full mail stack
├── tv/               # Media stack (16 services)
├── longhorn/         # Longhorn config
├── keel/             # Keel (Helm)
├── nameserver/       # BIND9
├── ...               # 35+ more namespaces
└── README.md

Helm History and ErrApplied¶

Fleet validates that the Helm release history is intact before each reconcile — specifically, it checks that current_release_version - maxHistory still exists in the Helm history secrets. When Kubernetes GC removes old secrets, Fleet enters ErrApplied and stops reconciling.

Prevention: every fleet.yaml sets helm.maxHistory: 25, giving a large enough window that GC never removes the oldest version Fleet needs.

# fleet.yaml — standard pattern
defaultNamespace: my-namespace
helm:
  maxHistory: 25
  releaseName: my-release
  chart: my-chart
  repo: https://charts.example.com/

Recovery (if it still occurs): clear status.release on the BundleDeployment to force a fresh helm install, bypassing the history chain check entirely.

Keel — Automated Image Updates¶

Keel polls container registries every 4 hours and compares image digests. If a digest has changed, Keel patches the Deployment and triggers a rolling update. This keeps all workloads on the latest upstream releases without manual intervention.

flowchart LR
    reg["Container Registry\ndocker.io / ghcr.io\nregistry.mdapi.ch"]
    keel["Keel\ndigest poll @every 4h"]
    deploy["Deployment"]
    pod["Rolling update"]

    reg -->|"digest changed?"| keel -->|"patch image tag"| deploy --> pod

Helm provider requires explicit pollSchedule

The global polling.defaultSchedule in Keel's values only applies to the Kubernetes annotations provider. For Helm-managed releases, pollSchedule must be declared explicitly inside the release's own values:

keel:
  trigger: poll
  pollSchedule: "@every 4h"
  images:
    - repository: image.repository
      tag: image.tag

Non-Fleet Resources¶

A small number of resources are deployed with kubectl apply directly against the mdapi-rancher management cluster and are not managed by Fleet — mainly because they manage Fleet itself or cross cluster boundaries:

~/rancher-local/ — ovpn-admin UI, cert-manager cluster issuers
These are committed to the mdapi/rancher-local GitLab repo for traceability, but not reconciled automatically