Skip to content

Hardware & Network Topology

Physical Layout

Edge — Internet path

flowchart LR
    ext["External clients"]
    ont["FTTH SFP+ ONU\n8311 firmware mod\n(OpenWrt-based)"]
    bpir4["BPI-R4\nOpenWrt 25.12-tillo\nMT7988a, dnsmasq,\nhaproxy, WireGuard"]
    bridge["br-lan trunk\n→ LAN / DMZ / ADLAN /\nwansub"]

    ext -->|"ISP /28\n31.3.128.48/28"| ont
    ont -->|"VLAN 11\nFTTH PPPoE"| bpir4
    bpir4 -->|"VLAN 10 (WAN trunk),\nLAN native (PVID 1)"| bridge

Cluster on the LAN — 192.168.1.0/24

flowchart LR
    bpir4["BPI-R4\n192.168.1.254\nLAN gateway"]
    techni["Technitium\n.54 DNS · .55 DHCP\nhome.tillo.ch zone"]
    mbt["mbptillo\n192.168.1.246\nsslh / mosh / OpenVPN"]

    subgraph k8s["mdapi-prod — Harvester HCI / RKE2"]
        qui["qui — bare metal"]
        quo["quo — bare metal"]
        qua["qua — bare metal"]
        vip191["ingress-nginx\nkube-vip 192.168.1.191"]
    end

    subgraph mgmt["Management"]
        rancher["Rancher"]
        ns["BIND9 (ns.mdapi.ch)\n192.168.1.53"]
    end

    bpir4 --> techni
    bpir4 --> mbt
    techni --> k8s
    mbt --> vip191
    rancher --> k8s

Storage hosts

flowchart LR
    k8s["mdapi-prod\n(Longhorn + S3 clients)"]
    garage["Garage\ngarage.home.tillo.ch\n(3-node quorum)"]
    salt["salt — TrueNAS CORE\nGarage\n+ NFS/iSCSI (idle)"]
    pepper["pepper\nGarage"]
    witness["baremetal witness\nGarage quorum"]

    k8s -->|"S3 + Longhorn off-site"| garage
    garage --- salt
    garage --- pepper
    garage --- witness
    k8s -.->|"democratic-csi NFS/iSCSI\n(no live PVCs)"| salt

Auxiliary networks

flowchart LR
    bpir4["BPI-R4"]
    dmz["DMZ\n192.168.7.0/24\n(sfp-lan.20, VLAN 20)"]
    adlan["ADLAN\n192.168.77.0/24\n(sfp-lan.70, VLAN 70)"]
    wansub["wansub\n31.3.128.49/28\n(routed ISP /28)"]
    cm["CipherTrust Manager\ncm.home.tillo.ch"]
    oob["Out-of-band / admin hosts"]

    bpir4 --> dmz --> cm
    bpir4 --> adlan --> oob
    bpir4 --> wansub
    cm -.->|"customer-fragment\nfor Akeyless ESO"| bpir4

VLAN ID conventions

VLAN Tag Trunk port Network
LAN (native) sometimes 1 (default PVID); untagged on lan1lan3 also tagged 10 on sfp-lan and wan 192.168.1.0/24 (br-lan)
WAN (LAN-side) 10 sfp-lan.10, wan.10 LAN traffic carried over the trunk to/from the WAN port group
FTTH PPPoE 11 sfp-wan.11 ISP-side PPPoE encapsulation toward the ONT
DMZ 20 sfp-lan.20 192.168.7.0/24
ADLAN 70 sfp-lan.70 192.168.77.0/24

External IP Map

ISP allocation: a /28 (31.3.128.48/28, usable .49–.62). The PPP endpoint sits on .49 (ftth-fixed); additional public IPs are configured as /32 aliases on pppoe-wan and DNAT'd (or haproxy-front-ended) into the cluster. The wansub interface (31.3.128.49/28) provides routed access to the rest of the /28 for hosts that need a real public IP rather than a DNAT target.

The authoritative scope of every public IP is the BPI-R4's dhcp.@host[*] reservation list (uci show dhcp). Each address has a human-readable name there.

IP DHCP name Inbound mechanism Inbound target Outbound SNAT match Purpose
31.3.128.49 ftth-fixed wansub interface PPP endpoint
31.3.128.50 mdapi-virtual-inbound DNAT 192.168.1.191:80,443 RKE2 ingress-nginx (all *.mdapi.ch)
31.3.128.51 cm-virtual-inbound haproxy frontend cm :443, TLS termination cm.mdapi.ch — CipherTrust Manager (DMZ) Akeyless customer-fragment endpoint. haproxy holds the LE cert because CM has no ACME client of its own.
31.3.128.53 cloud-envuassu-virtual-inbound DNAT 192.168.1.40:3443 (TCP), :4443 (QUIC) + 192.168.1.41:3478 Nextcloud AIO (envuassu) + Talk
31.3.128.54 mail-virtual haproxy TCP-mode :25/:465/:587/:993/:4190 with PROXY protocol to backend mail stack in K8s dest port = SMTP (TCP/25) docker-mailserver. The Service runs externalTrafficPolicy: Cluster (kube-proxy SNATs to a node-internal address); PROXY protocol restores the real client IP. Outbound SMTP also SNATs to .54 so PTR + SPF align with mail-virtual.
31.3.128.55 mirror-virtual-inbound DNAT .49 (mirror :21/:80/:443/:873/:40000-40050), .58:123 (ntppool), .44:53 (opennic) dest port = mirror NTP / DNS / rsync Shared mirror + NTP pool + OpenNIC
31.3.128.56 proxy-virtual-inbound DNAT 192.168.1.50:443 Squid HTTPS proxy
31.3.128.57 znc-virtual DNAT 192.168.1.51:113,443 dest port = IRC ZNC IRC bouncer
31.3.128.58 mbptillo-virtual-inbound DNAT 192.168.1.246:4443 sslh on mbptillo (TLS / SSH / OpenVPN demux) + WireGuard :51820
31.3.128.59 ns-virtual-inbound DNAT 192.168.1.53:53 dest port = authoritative DNS BIND9 (ns.mdapi.ch)
31.3.128.62 vpn-virtual-inbound DNAT 192.168.1.157:1194 (Firewalla VPN) + 192.168.1.46:32400 (Plex) + WireGuard wg_iot / wg_s2s fallback — every flow that doesn't match a more specific SNAT rule pctillo.tillo.ch for IoT/S2S VPNs and egress.mdapi.ch, the cluster's default outbound public IP.

Three edge mechanisms — DNAT, haproxy, SNAT

The /32 aliases on pppoe-wan serve traffic via three different patterns. They are not interchangeable — each exists because of a specific limitation on either side.

DNAT is the default. Most public IPs are plain destination-NAT rules: rewrite the destination to a LAN IP, forward, done. Used wherever the backend can speak the wire protocol directly (HTTP/2 with its own TLS, raw TCP, UDP). The cluster's RKE2 ingress on .50, the BIND9 nameserver on .59, sslh on .58, etc. all live here.

haproxy runs on the BPI-R4 (/etc/haproxy.cfg) and is used for two specific reasons:

  1. The backend cannot do ACME itself. The CipherTrust Manager appliance (cm.mdapi.ch, in the DMZ) has no Let's Encrypt client — haproxy on .51:443 terminates TLS with the LE cert and relays to CM in HTTP/2 mode.
  2. The backend needs the real client IP via PROXY protocol. docker-mailserver runs as a K8s workload behind a LoadBalancer Service with externalTrafficPolicy: Cluster, so kube-proxy SNATs every connection to a node-internal address. Without restoration, every connection looks like it comes from a cluster-internal address — which breaks per-IP rate-limiting, IP-based reputation, and fail2ban. haproxy on .54:25/465/587/993/4190 accepts the public connection, then opens the backend connection with PROXY protocol so docker-mailserver sees the original client.

SNAT controls which public IP is used as the source for outbound traffic. The naive "SNAT by source LAN IP" approach does not work here: every K8s pod's outbound packet is already SNAT'd by the CNI to one of the node IPs (192.168.1.190, .192, .193), so the BPI-R4 cannot tell mail traffic apart from DNS or HTTP traffic by source alone. Instead, SNAT is keyed off destination port — the firewall marks each forwarded connection by its destination service, then chooses the public source IP per mark:

Outbound service Destination port(s) SNAT source
docker-mailserver (SMTP) TCP/25 .54 (so PTR + SPF align with mail-virtual)
BIND9 authoritative replies UDP/53 sourced from 192.168.1.53 .59
mirror NTP/DNS/rsync stack per-port marks .55
ZNC bouncer IRC .57
Everything else (LAN, DMZ, ADLAN, ONT, wansub) .62 (egress.mdapi.ch)

Source of truth: uci show dhcp (host names), uci show firewall (DNAT, SNAT marks, zone-default SNAT), /etc/haproxy.cfg (TLS/TCP frontends), and ip -4 addr show pppoe-wan (assigned aliases).

Internal IP Reference

Device IP Role
FTTH ONT 192.168.11.1 Fiber modem (integrated in BPI-R4)
BPI-R4 192.168.1.254 (and .1 alias) LAN gateway, dnsmasq DNS, DHCP relay
mbptillo 192.168.1.246 Jump host / mosh / OpenVPN
qui iLO 192.168.1.170 BMC node 1
quo iLO 192.168.1.181 BMC node 2
qua iLO 192.168.1.182 BMC node 3
ingress-nginx (RKE2 builtin) 192.168.1.191 All HTTPS — nginx ingressClass; serves *.mdapi.ch and GitLab Pages
BIND9 nameserver 192.168.1.53 Authoritative public DNS (mdapi.ch, tillo.ch, …)
Technitium DNS 192.168.1.54 Internal DNS for home.tillo.ch
Technitium DHCP 192.168.1.55 Authoritative DHCP server (LAN/DMZ/ADLAN)
Garage S3 garage.home.tillo.ch S3 object store (3-node quorum)

VLANs

The BPI-R4 routes three internal VLANs in addition to br-lan (the trunk on the cabled LAN). Each VLAN has BPI-R4 as gateway and uses Technitium for DHCP via relay.

VLAN Subnet Gateway Purpose
br-lan (untagged) 192.168.1.0/24 192.168.1.254 Trusted LAN — servers, infra, workstations
sfp-lan.20 (DMZ) 192.168.7.0/24 192.168.7.254 DMZ for limited-trust workloads
sfp-lan.70 (ADLAN) 192.168.77.0/24 192.168.77.254 Administrative / out-of-band

Ingress Controller

The cluster runs a single nginx-ingress deployment — the RKE2 builtin rke2-ingress-nginx — exposed on kube-vip VIP 192.168.1.191 with IngressClass nginx and ModSecurity WAF enabled. It handles every *.mdapi.ch service including GitLab itself (gitlab.mdapi.ch, registry.mdapi.ch, kas.mdapi.ch) and GitLab Pages (*.pages.mdapi.ch + custom domains like docs.mdapi.ch).

GitLab Pages custom domains require an explicit Ingress with ingressClassName: nginx targeting the gitlab-gitlab-pages service on port 8090. The docs.mdapi.ch ingress lives at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/docs (public mirror).

Load Balancer Architecture

The cluster uses two LB mechanisms in parallel, each scoped to its own set of Services:

  • kube-vip (Harvester-bundled, harvester-system/kube-vip DaemonSet) — owns ingress-expose at 192.168.1.191. Configured via the harvester-system/vip ConfigMap (mode: static, ip: 192.168.1.191) and Service annotations kube-vip.io/ignore-service-security: "true" + kube-vip.io/loadbalancerIPs: 192.168.1.191. Leader-elected across control-plane nodes via the harvester-system/plndr-svcs-lock Lease; the elected node binds the VIP /32 on mgmt-br and answers ARP. Failover is automatic on node loss.
  • MetalLB (metallb-system namespace) — owns every other LoadBalancer Service in the pool 192.168.1.40–99 (metallb-pools/prod-pool.yml in Fleet). Strict opt-in via --lb-class=metallb: only Services with spec.loadBalancerClass: metallb are claimed; everything else is ignored, so MetalLB never fights kube-vip or any other controller. Speakers run on every node; for a Service with externalTrafficPolicy: Local, the speaker on the node hosting the local pod is the only one that ARPs the VIP — preserving real source IP.

Each Service pins its IP via spec.loadBalancerIP: 192.168.1.<n>. The class field is immutable in Kubernetes — recreating a Service to change class requires a brief delete+apply.

VIP Class Namespace Service Policy
192.168.1.40 metallb envuassu Nextcloud AIO Apache Local
192.168.1.41 metallb envuassu Nextcloud AIO Talk Local
192.168.1.42 metallb spider3 SFTPGo Local
192.168.1.43 metallb mqtt Eclipse Mosquitto Local
192.168.1.44 metallb opennic OpenNIC tier-2 Local
192.168.1.45 metallb honeypot Trapeye Local
192.168.1.46 metallb tv Plex Media Server Local
192.168.1.48 metallb mail docker-mailserver Cluster (real source IP via PROXY protocol from BPI-R4 haproxy)
192.168.1.49 metallb mirror Package mirror Local
192.168.1.50 metallb squid Squid proxy Local
192.168.1.51 metallb znc ZNC IRC bouncer Local
192.168.1.52 metallb openldap OpenLDAP Local
192.168.1.53 metallb nameserver BIND9 (public authoritative DNS) Local
192.168.1.54 metallb technitium Technitium DNS (internal home.tillo.ch) Local
192.168.1.55 metallb technitium Technitium DHCP (authoritative for LAN/DMZ/ADLAN) Local
192.168.1.56 metallb tv Rsync Local
192.168.1.57 metallb tv SFTPGo Local
192.168.1.58 metallb ntppool Chrony NTP Local
192.168.1.191 kube-vip kube-system ingress-expose (RKE2 ingress-nginx) Local

Free in pool: .47, .59–.99.

ingress-expose data path

BPI-R4 DNATs external HTTPS (31.3.128.50:80,443) to 192.168.1.191:80,443. The kube-vip leader holds .191/32 on its mgmt-br. rke2-ingress-nginx runs as a DaemonSet binding hostPort: 80,443 on every node, so the DNATed packet on the leader's host is captured by the local nginx pod's port mapping (CNI portmap iptables DNAT), preserving the client source IP. ModSecurity then sees the real source.

DNS & DHCP Architecture

DNS is split across four components, each with a distinct role:

Component Endpoint Authoritative for Used by
BIND9 192.168.1.53 (ext: 31.3.128.59:53) Public zones — mdapi.ch, tillo.ch, etc. External clients, cert-manager (RFC 2136 dynamic updates)
BPI-R4 dnsmasq 192.168.1.254 (per-VLAN: .1, 192.168.7.1, 192.168.77.1) Split-horizon overrides for ~80 hostnames pinned to internal VIPs All LAN/DMZ/ADLAN clients (DHCP option 6)
Technitium 192.168.1.54 home.tillo.ch zone — auto-registers DHCP-leased hosts BPI-R4 dnsmasq delegation (server=/home.tillo.ch/192.168.1.54)
rke2-coredns cluster-internal (kube-dns ClusterIP) cluster.local and pod-side resolution Every pod in the cluster

Resolution flow for an internal client:

  1. Client queries BPI-R4 dnsmasq (advertised via DHCP option 6).
  2. If the hostname matches a split-horizon override (gitlab.mdapi.ch, notes.mdapi.ch, …) → dnsmasq returns the internal VIP directly.
  3. If the hostname is under home.tillo.ch → dnsmasq forwards to Technitium (192.168.1.54) which serves the live DHCP register.
  4. Otherwise → dnsmasq forwards upstream to the WAN-provided resolvers.

Resolution flow for a cluster pod:

CoreDNS is overridden (HelmChartConfig on the rke2-coredns chart) to two forward blocks rather than the default forward . /etc/resolv.conf:

  • The mdapi.ch zone still goes to the BPI-R4 (forward . 192.168.1.1) to preserve split-horizon answers for our owned hostnames.
  • Everything else falls through to a sequential chain: NextDNS (45.90.28.16 / 45.90.30.16, paid filtering profile linked on the WAN IP) first, then Cloudflare 1.1.1.1 and Quad9 9.9.9.9 as fallback. policy sequential + health_check 5s + max_fails 2 means NextDNS handles 99.9 % of queries and CoreDNS only fails over on a real outage of the previous resolver.

This split keeps a router blip from cascading into a cluster-wide outage on container image pulls, runner clones, or any pod that talks to an external API — exactly the kind of incident the default single-upstream Corefile invites.

DHCP runs centrally on Technitium (192.168.1.55). BPI-R4 disables its own DHCP server on every VLAN and acts purely as a DHCP relay so all leases are registered and resolvable from a single source.

Lesson learned: the dnsmasq address directive is wildcard-based

address=/example.com/192.168.1.1 overrides the entire domain and every subdomain — it cannot be scoped to A-records only. For public domains (mdapi.ch, tillo.ch, …) the bpi-r4 dnsmasq sync only writes third-level entries (gitlab.mdapi.ch, never bare mdapi.ch); root-level entries would shadow MX/NS/TXT and break mail and registrar delegation.

Why Bare Metal + Harvester?

Harvester HCI runs RKE2 with KubeVirt integrated. iLO access on each node enables remote power management and out-of-band console access. The three-node setup provides etcd quorum and allows Longhorn to replicate volumes across two nodes while a third can be taken offline for maintenance.