Hardware & Network Topology¶
Physical Layout¶
Edge — Internet path¶
flowchart LR
ext["External clients"]
ont["FTTH SFP+ ONU\n8311 firmware mod\n(OpenWrt-based)"]
bpir4["BPI-R4\nOpenWrt 25.12-tillo\nMT7988a, dnsmasq,\nhaproxy, WireGuard"]
bridge["br-lan trunk\n→ LAN / DMZ / ADLAN /\nwansub"]
ext -->|"ISP /28\n31.3.128.48/28"| ont
ont -->|"VLAN 11\nFTTH PPPoE"| bpir4
bpir4 -->|"VLAN 10 (WAN trunk),\nLAN native (PVID 1)"| bridge
Cluster on the LAN — 192.168.1.0/24¶
flowchart LR
bpir4["BPI-R4\n192.168.1.254\nLAN gateway"]
techni["Technitium\n.54 DNS · .55 DHCP\nhome.tillo.ch zone"]
mbt["mbptillo\n192.168.1.246\nsslh / mosh / OpenVPN"]
subgraph k8s["mdapi-prod — Harvester HCI / RKE2"]
qui["qui — bare metal"]
quo["quo — bare metal"]
qua["qua — bare metal"]
vip191["ingress-nginx\nkube-vip 192.168.1.191"]
end
subgraph mgmt["Management"]
rancher["Rancher"]
ns["BIND9 (ns.mdapi.ch)\n192.168.1.53"]
end
bpir4 --> techni
bpir4 --> mbt
techni --> k8s
mbt --> vip191
rancher --> k8s
Storage hosts¶
flowchart LR
k8s["mdapi-prod\n(Longhorn + S3 clients)"]
garage["Garage\ngarage.home.tillo.ch\n(3-node quorum)"]
salt["salt — TrueNAS CORE\nGarage\n+ NFS/iSCSI (idle)"]
pepper["pepper\nGarage"]
witness["baremetal witness\nGarage quorum"]
k8s -->|"S3 + Longhorn off-site"| garage
garage --- salt
garage --- pepper
garage --- witness
k8s -.->|"democratic-csi NFS/iSCSI\n(no live PVCs)"| salt
Auxiliary networks¶
flowchart LR
bpir4["BPI-R4"]
dmz["DMZ\n192.168.7.0/24\n(sfp-lan.20, VLAN 20)"]
adlan["ADLAN\n192.168.77.0/24\n(sfp-lan.70, VLAN 70)"]
wansub["wansub\n31.3.128.49/28\n(routed ISP /28)"]
cm["CipherTrust Manager\ncm.home.tillo.ch"]
oob["Out-of-band / admin hosts"]
bpir4 --> dmz --> cm
bpir4 --> adlan --> oob
bpir4 --> wansub
cm -.->|"customer-fragment\nfor Akeyless ESO"| bpir4
VLAN ID conventions¶
| VLAN | Tag | Trunk port | Network |
|---|---|---|---|
| LAN (native) | sometimes 1 (default PVID); untagged on lan1–lan3 |
also tagged 10 on sfp-lan and wan |
192.168.1.0/24 (br-lan) |
| WAN (LAN-side) | 10 | sfp-lan.10, wan.10 |
LAN traffic carried over the trunk to/from the WAN port group |
| FTTH PPPoE | 11 | sfp-wan.11 |
ISP-side PPPoE encapsulation toward the ONT |
| DMZ | 20 | sfp-lan.20 |
192.168.7.0/24 |
| ADLAN | 70 | sfp-lan.70 |
192.168.77.0/24 |
External IP Map¶
ISP allocation: a /28 (31.3.128.48/28, usable .49–.62). The PPP endpoint sits on .49 (ftth-fixed); additional public IPs are configured as /32 aliases on pppoe-wan and DNAT'd (or haproxy-front-ended) into the cluster. The wansub interface (31.3.128.49/28) provides routed access to the rest of the /28 for hosts that need a real public IP rather than a DNAT target.
The authoritative scope of every public IP is the BPI-R4's dhcp.@host[*] reservation list (uci show dhcp). Each address has a human-readable name there.
| IP | DHCP name | Inbound mechanism | Inbound target | Outbound SNAT match | Purpose |
|---|---|---|---|---|---|
31.3.128.49 |
ftth-fixed |
wansub interface |
— | — | PPP endpoint |
31.3.128.50 |
mdapi-virtual-inbound |
DNAT | 192.168.1.191:80,443 |
— | RKE2 ingress-nginx (all *.mdapi.ch) |
31.3.128.51 |
cm-virtual-inbound |
haproxy frontend cm :443, TLS termination |
cm.mdapi.ch — CipherTrust Manager (DMZ) |
— | Akeyless customer-fragment endpoint. haproxy holds the LE cert because CM has no ACME client of its own. |
31.3.128.53 |
cloud-envuassu-virtual-inbound |
DNAT | 192.168.1.40:3443 (TCP), :4443 (QUIC) + 192.168.1.41:3478 |
— | Nextcloud AIO (envuassu) + Talk |
31.3.128.54 |
mail-virtual |
haproxy TCP-mode :25/:465/:587/:993/:4190 with PROXY protocol to backend | mail stack in K8s | dest port = SMTP (TCP/25) | docker-mailserver. The Service runs externalTrafficPolicy: Cluster (kube-proxy SNATs to a node-internal address); PROXY protocol restores the real client IP. Outbound SMTP also SNATs to .54 so PTR + SPF align with mail-virtual. |
31.3.128.55 |
mirror-virtual-inbound |
DNAT | .49 (mirror :21/:80/:443/:873/:40000-40050), .58:123 (ntppool), .44:53 (opennic) |
dest port = mirror NTP / DNS / rsync | Shared mirror + NTP pool + OpenNIC |
31.3.128.56 |
proxy-virtual-inbound |
DNAT | 192.168.1.50:443 |
— | Squid HTTPS proxy |
31.3.128.57 |
znc-virtual |
DNAT | 192.168.1.51:113,443 |
dest port = IRC | ZNC IRC bouncer |
31.3.128.58 |
mbptillo-virtual-inbound |
DNAT | 192.168.1.246:4443 |
— | sslh on mbptillo (TLS / SSH / OpenVPN demux) + WireGuard :51820 |
31.3.128.59 |
ns-virtual-inbound |
DNAT | 192.168.1.53:53 |
dest port = authoritative DNS | BIND9 (ns.mdapi.ch) |
31.3.128.62 |
vpn-virtual-inbound |
DNAT | 192.168.1.157:1194 (Firewalla VPN) + 192.168.1.46:32400 (Plex) + WireGuard wg_iot / wg_s2s |
fallback — every flow that doesn't match a more specific SNAT rule | pctillo.tillo.ch for IoT/S2S VPNs and egress.mdapi.ch, the cluster's default outbound public IP. |
Three edge mechanisms — DNAT, haproxy, SNAT¶
The /32 aliases on pppoe-wan serve traffic via three different patterns. They are not interchangeable — each exists because of a specific limitation on either side.
DNAT is the default. Most public IPs are plain destination-NAT rules: rewrite the destination to a LAN IP, forward, done. Used wherever the backend can speak the wire protocol directly (HTTP/2 with its own TLS, raw TCP, UDP). The cluster's RKE2 ingress on .50, the BIND9 nameserver on .59, sslh on .58, etc. all live here.
haproxy runs on the BPI-R4 (/etc/haproxy.cfg) and is used for two specific reasons:
- The backend cannot do ACME itself. The CipherTrust Manager appliance (
cm.mdapi.ch, in the DMZ) has no Let's Encrypt client — haproxy on.51:443terminates TLS with the LE cert and relays to CM in HTTP/2 mode. - The backend needs the real client IP via PROXY protocol. docker-mailserver runs as a K8s workload behind a
LoadBalancerService withexternalTrafficPolicy: Cluster, so kube-proxy SNATs every connection to a node-internal address. Without restoration, every connection looks like it comes from a cluster-internal address — which breaks per-IP rate-limiting, IP-based reputation, and fail2ban. haproxy on.54:25/465/587/993/4190accepts the public connection, then opens the backend connection with PROXY protocol so docker-mailserver sees the original client.
SNAT controls which public IP is used as the source for outbound traffic. The naive "SNAT by source LAN IP" approach does not work here: every K8s pod's outbound packet is already SNAT'd by the CNI to one of the node IPs (192.168.1.190, .192, .193), so the BPI-R4 cannot tell mail traffic apart from DNS or HTTP traffic by source alone. Instead, SNAT is keyed off destination port — the firewall marks each forwarded connection by its destination service, then chooses the public source IP per mark:
| Outbound service | Destination port(s) | SNAT source |
|---|---|---|
| docker-mailserver (SMTP) | TCP/25 | .54 (so PTR + SPF align with mail-virtual) |
| BIND9 authoritative replies | UDP/53 sourced from 192.168.1.53 |
.59 |
| mirror NTP/DNS/rsync stack | per-port marks | .55 |
| ZNC bouncer | IRC | .57 |
| Everything else (LAN, DMZ, ADLAN, ONT, wansub) | — | .62 (egress.mdapi.ch) |
Source of truth: uci show dhcp (host names), uci show firewall (DNAT, SNAT marks, zone-default SNAT), /etc/haproxy.cfg (TLS/TCP frontends), and ip -4 addr show pppoe-wan (assigned aliases).
Internal IP Reference¶
| Device | IP | Role |
|---|---|---|
| FTTH ONT | 192.168.11.1 | Fiber modem (integrated in BPI-R4) |
| BPI-R4 | 192.168.1.254 (and .1 alias) |
LAN gateway, dnsmasq DNS, DHCP relay |
| mbptillo | 192.168.1.246 | Jump host / mosh / OpenVPN |
| qui iLO | 192.168.1.170 | BMC node 1 |
| quo iLO | 192.168.1.181 | BMC node 2 |
| qua iLO | 192.168.1.182 | BMC node 3 |
| ingress-nginx (RKE2 builtin) | 192.168.1.191 | All HTTPS — nginx ingressClass; serves *.mdapi.ch and GitLab Pages |
| BIND9 nameserver | 192.168.1.53 | Authoritative public DNS (mdapi.ch, tillo.ch, …) |
| Technitium DNS | 192.168.1.54 | Internal DNS for home.tillo.ch |
| Technitium DHCP | 192.168.1.55 | Authoritative DHCP server (LAN/DMZ/ADLAN) |
| Garage S3 | garage.home.tillo.ch | S3 object store (3-node quorum) |
VLANs¶
The BPI-R4 routes three internal VLANs in addition to br-lan (the trunk on the cabled LAN). Each VLAN has BPI-R4 as gateway and uses Technitium for DHCP via relay.
| VLAN | Subnet | Gateway | Purpose |
|---|---|---|---|
br-lan (untagged) |
192.168.1.0/24 | 192.168.1.254 | Trusted LAN — servers, infra, workstations |
sfp-lan.20 (DMZ) |
192.168.7.0/24 | 192.168.7.254 | DMZ for limited-trust workloads |
sfp-lan.70 (ADLAN) |
192.168.77.0/24 | 192.168.77.254 | Administrative / out-of-band |
Ingress Controller¶
The cluster runs a single nginx-ingress deployment — the RKE2 builtin rke2-ingress-nginx — exposed on kube-vip VIP 192.168.1.191 with IngressClass nginx and ModSecurity WAF enabled. It handles every *.mdapi.ch service including GitLab itself (gitlab.mdapi.ch, registry.mdapi.ch, kas.mdapi.ch) and GitLab Pages (*.pages.mdapi.ch + custom domains like docs.mdapi.ch).
GitLab Pages custom domains require an explicit Ingress with ingressClassName: nginx targeting the gitlab-gitlab-pages service on port 8090. The docs.mdapi.ch ingress lives at https://gitlab.mdapi.ch/mdapi/fleet/-/tree/main/docs (public mirror).
Load Balancer Architecture¶
The cluster uses two LB mechanisms in parallel, each scoped to its own set of Services:
- kube-vip (Harvester-bundled,
harvester-system/kube-vipDaemonSet) — ownsingress-exposeat192.168.1.191. Configured via theharvester-system/vipConfigMap (mode: static,ip: 192.168.1.191) and Service annotationskube-vip.io/ignore-service-security: "true"+kube-vip.io/loadbalancerIPs: 192.168.1.191. Leader-elected across control-plane nodes via theharvester-system/plndr-svcs-lockLease; the elected node binds the VIP/32onmgmt-brand answers ARP. Failover is automatic on node loss. - MetalLB (
metallb-systemnamespace) — owns every otherLoadBalancerService in the pool192.168.1.40–99(metallb-pools/prod-pool.ymlin Fleet). Strict opt-in via--lb-class=metallb: only Services withspec.loadBalancerClass: metallbare claimed; everything else is ignored, so MetalLB never fights kube-vip or any other controller. Speakers run on every node; for a Service withexternalTrafficPolicy: Local, the speaker on the node hosting the local pod is the only one that ARPs the VIP — preserving real source IP.
Each Service pins its IP via spec.loadBalancerIP: 192.168.1.<n>. The class field is immutable in Kubernetes — recreating a Service to change class requires a brief delete+apply.
| VIP | Class | Namespace | Service | Policy |
|---|---|---|---|---|
| 192.168.1.40 | metallb | envuassu |
Nextcloud AIO Apache | Local |
| 192.168.1.41 | metallb | envuassu |
Nextcloud AIO Talk | Local |
| 192.168.1.42 | metallb | spider3 |
SFTPGo | Local |
| 192.168.1.43 | metallb | mqtt |
Eclipse Mosquitto | Local |
| 192.168.1.44 | metallb | opennic |
OpenNIC tier-2 | Local |
| 192.168.1.45 | metallb | honeypot |
Trapeye | Local |
| 192.168.1.46 | metallb | tv |
Plex Media Server | Local |
| 192.168.1.48 | metallb | mail |
docker-mailserver | Cluster (real source IP via PROXY protocol from BPI-R4 haproxy) |
| 192.168.1.49 | metallb | mirror |
Package mirror | Local |
| 192.168.1.50 | metallb | squid |
Squid proxy | Local |
| 192.168.1.51 | metallb | znc |
ZNC IRC bouncer | Local |
| 192.168.1.52 | metallb | openldap |
OpenLDAP | Local |
| 192.168.1.53 | metallb | nameserver |
BIND9 (public authoritative DNS) | Local |
| 192.168.1.54 | metallb | technitium |
Technitium DNS (internal home.tillo.ch) |
Local |
| 192.168.1.55 | metallb | technitium |
Technitium DHCP (authoritative for LAN/DMZ/ADLAN) | Local |
| 192.168.1.56 | metallb | tv |
Rsync | Local |
| 192.168.1.57 | metallb | tv |
SFTPGo | Local |
| 192.168.1.58 | metallb | ntppool |
Chrony NTP | Local |
| 192.168.1.191 | kube-vip | kube-system |
ingress-expose (RKE2 ingress-nginx) |
Local |
Free in pool: .47, .59–.99.
ingress-expose data path
BPI-R4 DNATs external HTTPS (31.3.128.50:80,443) to 192.168.1.191:80,443. The kube-vip leader holds .191/32 on its mgmt-br. rke2-ingress-nginx runs as a DaemonSet binding hostPort: 80,443 on every node, so the DNATed packet on the leader's host is captured by the local nginx pod's port mapping (CNI portmap iptables DNAT), preserving the client source IP. ModSecurity then sees the real source.
DNS & DHCP Architecture¶
DNS is split across four components, each with a distinct role:
| Component | Endpoint | Authoritative for | Used by |
|---|---|---|---|
| BIND9 | 192.168.1.53 (ext: 31.3.128.59:53) |
Public zones — mdapi.ch, tillo.ch, etc. |
External clients, cert-manager (RFC 2136 dynamic updates) |
| BPI-R4 dnsmasq | 192.168.1.254 (per-VLAN: .1, 192.168.7.1, 192.168.77.1) |
Split-horizon overrides for ~80 hostnames pinned to internal VIPs | All LAN/DMZ/ADLAN clients (DHCP option 6) |
| Technitium | 192.168.1.54 |
home.tillo.ch zone — auto-registers DHCP-leased hosts |
BPI-R4 dnsmasq delegation (server=/home.tillo.ch/192.168.1.54) |
| rke2-coredns | cluster-internal (kube-dns ClusterIP) |
cluster.local and pod-side resolution |
Every pod in the cluster |
Resolution flow for an internal client:
- Client queries BPI-R4 dnsmasq (advertised via DHCP option 6).
- If the hostname matches a split-horizon override (
gitlab.mdapi.ch,notes.mdapi.ch, …) → dnsmasq returns the internal VIP directly. - If the hostname is under
home.tillo.ch→ dnsmasq forwards to Technitium (192.168.1.54) which serves the live DHCP register. - Otherwise → dnsmasq forwards upstream to the WAN-provided resolvers.
Resolution flow for a cluster pod:
CoreDNS is overridden (HelmChartConfig on the rke2-coredns chart) to two forward blocks rather than the default forward . /etc/resolv.conf:
- The
mdapi.chzone still goes to the BPI-R4 (forward . 192.168.1.1) to preserve split-horizon answers for our owned hostnames. - Everything else falls through to a sequential chain: NextDNS (
45.90.28.16/45.90.30.16, paid filtering profile linked on the WAN IP) first, then Cloudflare1.1.1.1and Quad99.9.9.9as fallback.policy sequential+health_check 5s+max_fails 2means NextDNS handles 99.9 % of queries and CoreDNS only fails over on a real outage of the previous resolver.
This split keeps a router blip from cascading into a cluster-wide outage on container image pulls, runner clones, or any pod that talks to an external API — exactly the kind of incident the default single-upstream Corefile invites.
DHCP runs centrally on Technitium (192.168.1.55). BPI-R4 disables its own DHCP server on every VLAN and acts purely as a DHCP relay so all leases are registered and resolvable from a single source.
Lesson learned: the dnsmasq address directive is wildcard-based
address=/example.com/192.168.1.1 overrides the entire domain and every subdomain — it cannot be scoped to A-records only. For public domains (mdapi.ch, tillo.ch, …) the bpi-r4 dnsmasq sync only writes third-level entries (gitlab.mdapi.ch, never bare mdapi.ch); root-level entries would shadow MX/NS/TXT and break mail and registrar delegation.
Why Bare Metal + Harvester?¶
Harvester HCI runs RKE2 with KubeVirt integrated. iLO access on each node enables remote power management and out-of-band console access. The three-node setup provides etcd quorum and allows Longhorn to replicate volumes across two nodes while a third can be taken offline for maintenance.