Skip to content

BPI-R4 OpenWrt Router

The edge router is a Banana Pi BPI-R4 running a custom fork of OpenWrt 25.12 targeting the MT7988a SoC. It is the single piece of equipment between the FTTH ONT and the LAN — it terminates PPPoE on the WAN, relays DHCP to the in-cluster Technitium server, and terminates two WireGuard VPNs. The build is fully automated via GitLab CI.

Build Pipeline

flowchart LR
    push["git push\nopenwrt-25.12-tillo branch"]
    ci["GitLab CI\nproject ID 33"]
    build["OpenWrt build system\ncross-compile MT7988a"]
    artifact["sysupgrade.itb\n(firmware artifact)"]
    flash["SSH to bpi-r4\ncat > /tmp/sysupgrade.itb\nsysupgrade -v"]

    push --> ci --> build --> artifact --> flash

CI is triggered on every push. The firmware artifact is stored in the GitLab generic packages registry, keyed by pipeline IID.

Kernel Customizations

Custom patches live in target/linux/mediatek/patches-6.12/, applied in lexicographic order via the quilt workflow. Key patches:

  • 197-dts-mt7988a-add-ramoops.patch — reserves 1 MiB at 0x42f00000 for ramoops/pstore (record-size 128 KiB ×5, 256 KiB console, 64 KiB pmsg, 64 KiB ftrace), enabling kernel crash dumps to survive reboots
  • 970-net-ethernet-mtk_eth_soc-increase-warm-reset-timeout.patchmtk_hw_warm_reset RSTCTRL_FE timeout 1ms → 100ms (avoids spurious "warm reset failed" on the SoC)
  • target/linux/generic/pending-6.12/989-net-sfp-prefix-match-quirks.patch + 990-add_sfp_quirks.patch — adds SFP_QUIRK_F_PREFIX and registers our XGS-PON ONT sticks (the production FS XGS-SFP-ONT-MACI and the OEM XGSPONST2001 clone kept as fallback) with sfp_fixup_potron. Masks SFP_F_TX_FAULT | SFP_F_LOS in state_hw_mask so the SFP state machine doesn't disable the module on spurious assertions, and bumps T_START_UP to 60 s for the slow PON bring-up. Prefix matching is required because the OEM clone fills vendor_pn past the legitimate string with non-printable garbage instead of SFF-8472 space padding. See SFP TX-fault storm for what this does and doesn't prevent.
  • Lockup detectorsSOFTLOCKUP_DETECTOR, HARDLOCKUP_DETECTOR, DETECT_HUNG_TASK, and watchdog pretimeout panic enabled for crash diagnostics

The matching u-boot side (which injects its own ramoops node into the kernel FDT before booting) also has to use 1 MiB at 0x42f00000:

  • package/boot/uboot-mediatek/patches/103-04-mt7988-enable-pstore.patch — u-boot's mt7988.dtsi
  • package/boot/uboot-mediatek/patches/450-add-bpi-r4.patch — six BPI-R4 per-variant defconfigs that set CONFIG_CMD_PSTORE_MEM_ADDR=0x42f00000

Lesson learned: kernel and u-boot ramoops must match

If the kernel DT and u-boot defconfig disagree on the ramoops region, the kernel logs OF: reserved mem: OVERLAP DETECTED! at boot and falls back to whichever node was registered first — usually the smaller one. pstore then captures only ~16 KiB per record instead of 128 KiB. Verify by counting 0x42ff0000 bytes in the FIP after a build:

python3 -c "import sys; d=open('uboot.fip','rb').read(); print(d.count(bytes.fromhex('0000ff42')))"
The result must be 0. Anything else means a defconfig still has the old address.

Kernel config layering

Changes must go into target/linux/mediatek/filogic/Config-kernel.innot directly into config-6.12. Config-kernel.in is processed last and overrides config-6.12. Edits to config-6.12 alone are silently reverted on the next build.

Flash Procedure

Dropbear (the SSH server in OpenWrt) has no sftp-server. File transfers must use stdin/stdout — scp fails silently or with a protocol error, never use it.

Kernel + rootfs (sysupgrade)

ssh root@bpi-r4 'cat > /tmp/sysupgrade.itb' < /path/to/sysupgrade.itb
ssh root@bpi-r4 'sysupgrade -T /tmp/sysupgrade.itb'   # validate first
ssh root@bpi-r4 'sysupgrade /tmp/sysupgrade.itb'      # flash + reboot

sysupgrade writes only the production partition (/dev/mmcblk0p5) and reboots (~2-3 min).

Bootloader (FIP)

When the change is in u-boot itself (DT, defconfig, BL31), sysupgrade is not enough — u-boot lives in the fip partition (/dev/mmcblk0p3). Flash directly:

# Build artifact: openwrt-mediatek-filogic-bananapi_bpi-r4-emmc-bl31-uboot.fip
# Sanity check before flashing
python3 -c "import sys; d=open('uboot.fip','rb').read(); print('size', len(d), 'must <= 4194304'); print('0x42ff0000 hits', d.count(bytes.fromhex('0000ff42')), 'must be 0')"

ssh root@bpi-r4 'cat > /tmp/uboot.fip' < /path/to/uboot.fip
ssh root@bpi-r4 'dd if=/tmp/uboot.fip of=/dev/mmcblk0p3 bs=1M conv=fsync'
# verify the readback matches
ssh root@bpi-r4 'sha256sum /tmp/uboot.fip; SIZE=$(wc -c < /tmp/uboot.fip); dd if=/dev/mmcblk0p3 bs=1 count=$SIZE 2>/dev/null | sha256sum'
ssh root@bpi-r4 'reboot'   # u-boot only takes effect after a reboot

The BL2 preloader (emmc-preloader.binmmcblk0boot0/1) is rarely changed and isn't needed for u-boot DT or defconfig changes.

Recovery (TFTP)

If a flash bricks u-boot, the BPI-R4's u-boot environment has TFTP recovery macros baked in:

  • bootmenu_4 → "Load production system via TFTP then write to eMMC"
  • boot_tftp_write_bl2 → TFTP-load bootfile_bl2 (= openwrt-mediatek-filogic-bananapi_bpi-r4-emmc-preloader.bin) and write
  • boot_tftp_write_fip → TFTP-load bootfile_fip (= openwrt-mediatek-filogic-bananapi_bpi-r4-emmc-bl31-uboot.fip) and write

Requires serial console + a TFTP server with the correct file names.

eMMC Partition Layout

Partition Label Size Contents
mmcblk0boot0/1 (eMMC HW boot) 4 MiB ea BL2 preloader
mmcblk0p1 ubootenv 512 KiB u-boot environment
mmcblk0p2 factory 2 MiB factory data
mmcblk0p3 fip 4 MiB BL31 + u-boot proper
mmcblk0p4 recovery 32 MiB recovery FIT
mmcblk0p5 production 2 GiB kernel + rootfs (target of sysupgrade)

MT7988a Reserved Memory

Address Size Purpose
0x42f00000 1 MiB ramoops / pstore (kernel crash dumps)
0x43000000 320 KiB ATF / secmon

Post-Crash Diagnosis

After a kernel crash and reboot, pstore holds the previous kernel's dmesg:

ls /sys/fs/pstore/                # look for dmesg-ramoops-N or console-ramoops-0
cat /sys/fs/pstore/dmesg-ramoops-0

/etc/rc.local mounts pstore on boot. The boot-reason logger (see Diagnostic Infrastructure) auto-summarises the previous boot's pstore state into /etc/last_boot.json and pushes a sensor to Home Assistant.

Diagnostic Infrastructure

A set of helper scripts lives in /usr/local/bin/ (all preserved across firmware upgrades via /etc/sysupgrade.conf):

Script Purpose
boot-reason.sh Runs from rc.local at boot. Snapshots pstore record count, dmesg head, page_pool state, kernel version, build into /etc/last_boot.json and pushes sensors to HA (sensor.bpi_r4_boot_cause, sensor.bpi_r4_pstore_records, sensor.bpi_r4_uptime_at_capture, sensor.bpi_r4_page_pool_inflight_boot).
health-snapshot.sh Cron * * * * *. Captures per-minute time-series of WAN/LAN/PPPoE counters, softnet, IRQ totals, page_pool, conntrack into /etc/last_health.jsonl (10 080 lines = 7 days at 1/min, persisted across sysupgrade). The forensic ring used to reconstruct outage timelines. Also pushes derived sensors (sensor.bpi_r4_wan_carrier, sensor.bpi_r4_wan_tx_bps, sensor.bpi_r4_wan_tx_pps, sensor.bpi_r4_softnet_drop_rate, sensor.bpi_r4_heartbeat, sensor.bpi_r4_pppoe_state) to HA.
page-pool-watch.sh Cron * * * * *. Reads dmesg \| grep page_pool_release_retry \| tail -1, parses inflight Nsec, pushes sensor.bpi_r4_page_pool_age and sensor.bpi_r4_page_pool_inflight to HA. At 600 s of stall: triggers echo t > /proc/sysrq-trigger (task stack dump → netconsole). At 1800 s with /etc/page-pool-watch.reboot=1: triggers echo c > /proc/sysrq-trigger (controlled panic → captures pstore + clean reboot).
sfp-txfault-watch.sh Cron * * * * *. Computes sfp1-tx-fault IRQ rate per minute from /proc/interrupts, pushes sensor.bpi_r4_sfp1_txfault_rate and sensor.bpi_r4_sfp1_txfault_event to HA. Baseline ~1–2/min; storm ~110 000/min (1 800/sec). At 3 consecutive minutes >1 000/min: dumps task stacks via sysrq-t into /etc/last_sfp_storm.txt. At 5 minutes with /etc/sfp-storm-watch.reboot=1: controlled panic via sysrq-c. Mirrors the page-pool-watch escalation pattern.
sfp-recover.sh Soft-reset the WAN SFP without physically pulling the module. Three escalation levels: A) ip link sfp-wan down/up, B) ifdown/ifup wan, C) unbind/bind the SFP platform driver via /sys/bus/platform/drivers/sfp/. Run sfp-recover.sh for auto-escalation, or pass 1/2/3 for a single step.
haproxy-watch.sh Cron * * * * *. Restarts haproxy if it's down while vip is up for ≥120 s; pushes sensor.bpi_r4_haproxy_state, ..._down_age, ..._event to HA. Backstop for procd respawn.

Both crash detection and the MTK ethernet page_pool_release_retry leak are then surfaced in Home Assistant — see Home Assistant Integration.

Hardware watchdog + sysctls

The MT7988a hardware watchdog is configured to panic on starvation:

$ cat /sys/class/watchdog/watchdog0/{identity,pretimeout_governor,pretimeout,timeout}
mtk-wdt
panic
15
30

Combined with these sysctls (in /etc/sysctl.d/99-tillo-panic.conf and the OpenWrt defaults):

kernel.panic = 10
kernel.panic_on_oops = 1
kernel.sysrq = 1
kernel.hung_task_panic = 1
kernel.softlockup_panic = 1

If the kernel hangs for >15 s without kicking the watchdog, pretimeout fires panic() (which writes to ramoops console), and 10 s later the SoC resets. Pstore captures the panic on next boot.

Home Assistant Integration

The router-side scripts push to HA via /api/states/<sensor>. Two automations on top:

automation.bpi_r4_crash_captured_via_pstore

Triggers when sensor.bpi_r4_boot_cause becomes panic or reset_with_pstore (i.e., the previous boot crashed and pstore captured something). Sends Pushover priority+1 + persistent_notification with the pstore record count, uptime at capture, and the SSH command to dump the actual log.

automation.bpi_r4_page_pool_stall_mtk_leak

Triggers when sensor.bpi_r4_page_pool_age > 600 (10 min stall). Escalation:

  1. t=0 — Initial dashboard + Pushover; the router-side script has already triggered sysrq-t for task stacks.
  2. t+20min if still > 1800 s — fires shell_command.bpi_r4_reboot (defined in home-assistant-cm.yml, SSH-keyed via /config/.ssh/bpi_r4_ed25519/etc/dropbear/authorized_keys on the router), then waits 4 min and confirms recovery from the new boot_cause.
  3. Otherwise — dismisses the dashboard notification (recovery happened on its own).

The SSH key is HA-side only; the matching pubkey on the router is registered in /etc/sysupgrade.conf so it survives flashes.

automation.bpi_r4_sfp_tx_fault_storm_carrier_down_onu_power_cycle

First-line response for an XGS-PON wedge. Triggers on sensor.bpi_r4_sfp1_txfault_rate > 10000 sustained for 2 min with sensor.bpi_r4_wan_carrier = 0. Cycles the Hue plug 15 s, waits 4 min for boot + PLOAM re-registration. Recovery in ~5 min. The 2-min hysteresis filters single-sample blips (60 s carrier flaps where the rate spikes for one sample then returns to baseline). See SFP TX-fault storm.

automation.bpi_r4_sfp_tx_fault_storm_clean_reboot (carrier-up only)

Triggers on sensor.bpi_r4_sfp1_txfault_rate > 1000 for 5 min with sensor.bpi_r4_wan_carrier = 1. Runs shell_command.bpi_r4_reboot. Reserved for the kernel-IRQ-wedge case where the link is technically still up but the kernel is buried under IRQ load. Carrier-down storms route to the sibling automation above — rebooting the SoC during a PON wedge is a no-op (the laser is at the optical front-end, below the kernel's reach).

automation.onu_watchdog_power_cycle_if_ploam_not_in_o5

If sensor.onu_ploam_state != 51 (O5) for 7 min, tries a soft pon reboot via ubus first; falls back to a Hue-plug power cycle after 3 min. Last-resort safety net for PON state issues that don't manifest as a TX-fault storm.

automation.router_watchdog_power_cycle_on_unreachable (last resort)

If binary_sensor.192_168_1_254 stays off for 15 min, power-cycles the router via the Hue smart plug. Untouched by the new layer — it's the safety net for the case where everything else has failed.

Known Issue: sfp-lan Zero Traffic

If the SFP LAN port (GMAC1) shows link-up but zero hardware TX/RX bytes, the RSS/LRO patches (999-eth-08*, 999-eth-09*) have corrupted the GMAC1 GDMA data path. Removing those patch files resolves it.

Known Issue: MTK page_pool teardown leak

Symptom in dmesg (and /var/log/bpi-r4.log on mbptillo via netconsole):

page_pool_release_retry() stalled pool shutdown: id 12, 1 inflight Nsec

The line repeats every 60 s with Nsec ticking up forever. After a few hours of accumulation, the next pppoe-wan flap (e.g., ISP-side LCP timeout) wedges the SoC ethernet completely and the watchdog hard-resets — usually without leaving a useful pstore record because the kernel's already too sick to schedule the panic write.

Mitigations applied:

  • net: ethernet: mtk_eth_soc: initialize PPE per-tag-layer MTU registers — upstream commit 2dddb34dd0 (already in 6.12.85 stable). Fixes the actual root cause: PPE was punting PPPoE-encapsulated frames to the CPU because VLAN_MTU registers were uninitialised, which kept page_pool refs alive across teardown.
  • Removed 999-9907-2-mtk-use-net_prefetch-for-non-pagepool-path.patch — out-of-tree MTK SDK patch that operates on the same RX hot path Felix Fietkau reverted upstream (79d3db7447, "Revert: improve mtk_eth_soc performance — stability issues").
  • page-pool-watch.sh as a third-line defence: dumps task stacks at 10 min stall, optionally triggers controlled panic+reboot at 30 min if /etc/page-pool-watch.reboot=1. HA's bpi_r4_page_pool_stall_mtk_leak automation does the same via clean SSH reboot if the watcher's gate is off.

Known Issue: SFP TX-fault storm (XGS-PON wedge)

The WAN SFP is an FS XGS-SFP-ONT-MAC-I MAC-mode XGS-PON ONT (MaxLinear PRX126 silicon, accessible as ssh onu), which replaced an OEM XGSPONST2001 stick of the same silicon family — so the failure mode and the kernel quirk below are unchanged. Under certain optical-layer disturbances (OLT-side events, fiber bend, laser bias instability) the laser cycles rapidly. Observed pattern: TX-fault assertions hit roughly five orders of magnitude above baseline (baseline ~1–2/min, storms reach ~110 000/min ≈ 1 800/sec) and persist for minutes. The OLT loses our upstream signal, stops sending downstream, and our 10GBASE-R SerDes loses bit-sync → wan_carrier=0.

Layer model

PRX126 laser  →  cage TX_FAULT pin  →  GPIO IRQ (line 69)  →  SFP state machine  →  phylink/SerDes  →  HA sensors

The kernel quirk (SFP_QUIRK_F_PREFIX("FS", "XGS-SFP-ONT-MACI", sfp_fixup_potron)) masks SFP_F_TX_FAULT | SFP_F_LOS in state_hw_mask, so the SFP state machine doesn't disable the module on these signals. The IRQ still fires (the GPIO handler is wired regardless), which is why /proc/interrupts count is the canary that surfaces the storm. The carrier still drops when it does, because that's reported by phylink/SerDes from real loss of received frames — below the SFP code's reach.

Lesson learned: a kernel-side TX-fault quirk can't paper over a real optical event

sfp_fixup_potron does its job (no spurious module-offline), but the carrier still drops when the laser actually cycles. Don't add more aggressive host-side suppression — it would only hide the failure, not prevent it.

Recovery

The only effective recovery is power-cycling the ONU (the "Internet" Hue plug carries both router + ONU on one circuit). A SoC reboot doesn't clear the storm: the laser sits in the optical front-end, below the kernel's reach. The carrier-down automation (above) routes directly to ONU power-cycle for this reason; the carrier-up clean reboot is reserved for the kernel-IRQ-wedge case.

Lesson learned: don't daisy-chain watchdogs on a single failure mode

Three watchdogs serialised on PON wedge — unreachable-watchdog at 15 min, then SoC reboot at 5 min after that, then ONU power-cycle at 7 min after that — produces ~10 min recovery and the middle step is wasted (it does nothing for this fault mode). Disambiguate by carrier state (wan_carrier=0 vs =1) and route directly to the right action.

ONU side

ssh onu lands on the module's stock FS firmware on the same MaxLinear PRX126 platform (the retired OEM stick ran LEDE plus the 8311 was-software community mod). The pon CLI and the laser/optic knobs in /etc/config/optic (laser timings, rogue_auto_en, tx_pup_mode) and /etc/config/gpon are the same platform tooling and should not be tweaked without a specific reason and a way to validate.

Forensic reconstruction

After a storm, replay the timeline from /etc/last_health.jsonl (health-snapshot.sh ring, 7 days, persisted across sysupgrade). Note: timestamps in the file are local time, while the HA logbook is UTC — easy to mistake for two different events.

ssh root@bpi-r4 'grep -E "<YYYY-MM-DD>T<HH>:" /etc/last_health.jsonl' | python3 -c "
import sys, json
prev=None
for l in sys.stdin:
    d=json.loads(l); irq=d['irq']['sfp1tx']
    delta=irq-prev if prev is not None else 0
    prev=irq
    print(f\"{d['ts']}  car={d['wan']['car']} op={d['wan']['op']:>7}  sfp1tx={irq}  delta={delta:>6}\")"

A delta of 100 000+ per minute with car=0 is the storm signature. A delta reset to a small value mid-window means the IRQ counter was reset — i.e. the kernel rebooted that minute.

Known Issue: dnsmasq early-boot crash loop

dnsmasq's init script is at S19dnsmasq, network is S20network. At early boot dnsmasq tries to bind on the WAN VIPs (e.g. 31.3.128.59:53) before pppoe-wan has finished negotiating, fails 6 times in <1 s with Address in use, and procd's default respawn 3600 5 5 circuit breaker gives up. The router then runs without dnsmasq until manual intervention.

Fix in /etc/rc.local (idempotent, applied on every boot, persisted via /etc/sysupgrade.conf):

sed -i 's/procd_set_param respawn\b.*/procd_set_param respawn 3600 10 30/' /etc/init.d/dnsmasq 2>/dev/null

This loosens the respawn budget to 30 retries × 10 s wait = ~5 min of patience, long enough for pppoe-wan to come up. Same pattern is already in place for haproxy.

Lesson learned: never signal $(cat /var/run/<svc>/<svc>.pid) for jailed services

OpenWrt's ujail puts each service in its own PID namespace, so the pidfile contains the jail-internal PID 1 — which on the host is procd. Signaling that PID (e.g. kill -USR1 $(cat /var/run/dnsmasq/dnsmasq.pid)) sends the signal to procd; on many builds that's enough to reboot the router. Always use pgrep -f <pattern> or pidof <binary> (from outside the jail) to get the real host PID before signalling.

Network Topology

Interface Subnet Role
pppoe-wan 31.3.128.50–.59, .62 (and transit) PPPoE WAN with the ISP-allocated /28
br-lan (untagged) 192.168.1.0/24 — gateway 192.168.1.254 Trusted LAN
sfp-lan.20 192.168.7.0/24 — gateway 192.168.7.254 DMZ
sfp-lan.70 192.168.77.0/24 — gateway 192.168.77.254 ADLAN (admin / OOB)
wg_iot 10.8.0.0/24 — 10.8.0.1 IoT-restricted WireGuard tunnel
wg_s2s 10.8.1.0/24 — 10.8.1.1 Site-to-site WireGuard
jool 192.168.164.0/24 NAT64 stateful translation in a dedicated netns; pool4 mark split (.2 = external v6, .3 = internal LAN v6) — see external traffic → NAT64

Split-Horizon DNS

Split-horizon resolution — answering *.mdapi.ch on internal VIPs so LAN traffic skips the WAN-side hairpin — no longer runs on the router. It moved into an in-cluster unbound resolver that owns the well-known LAN DNS VIP 192.168.1.1 (a MetalLB L2 address advertised on mgmt-br). See External Traffic → Split-Horizon DNS.

The router's dnsmasq keeps no address= overrides of its own. It still answers DNS on the DMZ and ad-LAN segments, but purely as a forwarder — no-resolv with a single upstream of 192.168.1.1 — so those clients resolve against the same unbound override set as the LAN.

Lesson learned: never override a root domain

An override for a zone apex (mdapi.ch itself, rather than gitlab.mdapi.ch) shadows MX, NS, TXT, and SPF for the whole zone — mail and registrar delegation break for internal clients. The override list maps third-level hostnames only; the apex is always left to resolve normally.

DHCP — Centralised on Technitium

Local DHCP servers are disabled on every BPI-R4 scope (option ignore '1' in /etc/config/dhcp). Three relay sections forward DISCOVER/REQUEST traffic to the cluster instead:

config relay 'relay_lan'   { interface 'lan';       local_addr '192.168.1.254';   server '192.168.1.55' }
config relay 'relay_dmz'   { interface 'sfp-lan-20'; local_addr '192.168.7.254';   server '192.168.1.55' }
config relay 'relay_adlan' { interface 'sfp-lan-70'; local_addr '192.168.77.254';  server '192.168.1.55' }

Technitium owns all reservations, registers leases into its home.tillo.ch zone, and hands out each segment's local DNS address as the DNS server in the OFFER — 192.168.1.1 on the LAN, 192.168.7.1 / 192.168.77.1 on the DMZ / ad-LAN. On the LAN that address is the in-cluster split-horizon resolver (see Split-Horizon DNS).

Lesson learned: K8s SNAT vs DHCP relay

Replies from Technitium come back with the K8s-node source IP after CNI SNAT, not the LB IP. A custom nftables rule in /etc/nftables.d/dhcp-relay-fix.nft rewrites the reply source from {node-IPs}:67 back to 192.168.1.55:67 so dnsmasq's relay socket accepts them. The file is registered in sysupgrade.conf so it survives firmware upgrades.

WireGuard

Two separate WireGuard interfaces share the BPI-R4 listening on the public WAN:

Interface Port Subnet Purpose
wg_iot 51820 10.8.0.0/24 Phones, laptops, and other roaming endpoints — full-tunnel into the LAN
wg_s2s 51821 10.8.1.0/24 Site-to-site to the gum peer

The public endpoint is pctillo.tillo.ch (DNS pointing at 31.3.128.62, an alias on pppoe-wan). Peer config is managed in /etc/config/network (config wireguard_wg_iot and config wireguard_wg_s2s sections).

IPv6 TX Checksum Workaround

The MT7988a NAT engine produces incorrect IPv6 TX checksums on the SFP-LAN VLANs in some flows. Two ethtool -K calls in /etc/rc.local disable hardware TX checksumming on those interfaces and fall back to software checksums:

ethtool -K sfp-lan.20 tx-checksum-ip-generic off
ethtool -K sfp-lan.70 tx-checksum-ip-generic off

Without this, IPv6 traffic crossing the VLAN boundary is dropped by destinations that strictly verify checksums.