Stateful Failover in 5x9 vBNG

At 5×9 Networks, we spend a lot of time thinking about something users should never have to think about: what happens when things fail.

Because failures do happen—nodes crash, processes restart, links break. The real question isn’t if, but what the user experiences when it does. And for a long time, in broadband networks, the answer was simple: users got disconnected.

How Failover Used to Work

In traditional BNG architectures, failover was fundamentally stateless.

When a BNG instance went down, it didn’t matter how many active subscribers were connected or what they were doing at that moment—the system simply lost all session information. The standby node would take over, but it had no memory of existing users. From the network’s perspective, the only option was to start fresh. Subscribers would get disconnected, their devices would attempt to reconnect, sessions would be re-established via PPPoE or IPoE, and the BNG would rebuild everything: IP assignments, QoS policies, NAT bindings, accounting sessions. Technically, the service would recover. But from the user’s perspective, that brief interruption could mean a dropped call, a frozen stream, or a broken VPN session.

This model worked in a time when occasional disruption was acceptable. Today, it’s increasingly out of place.

The Shift Toward Continuity

Modern applications assume persistent connectivity. A sub-second interruption is no longer “invisible”—it’s disruptive.

That’s where stateful failover comes in, and why we made it a core part of the 5×9 vBNG architecture. Instead of treating failover as a reset, stateful systems treat it as a continuation. The key idea is simple: if a standby system already knows everything about active subscribers, it doesn’t need to rebuild anything, it can just take over. In practice, this means continuously synchronizing subscriber state between active and standby nodes. Not just high-level session awareness, but the full operational context—addresses, policies, translations, counters, forwarding entries.

So when a failure happens, the standby node doesn’t start from zero. It steps in mid-stream. And from the user’s point of view, nothing happens at all.

Control Plane Impact

With stateless failover, recovery depends on how quickly thousands (or millions) of devices reconnect and re-authenticate. That creates bursts of control-plane traffic, load on AAA systems, and variability in how long users are offline.

With stateful failover, none of that happens. There’s no mass reconnection, no signaling storm, no rebuilding of sessions. The system simply continues operating on another node. This has a ripple effect across the network. Control planes remain stable during failures. Backend systems aren’t suddenly flooded with requests. Recovery becomes predictable and almost instantaneous.

Seamless Upgrades

One of the often overlooked advantages of stateful failover is that it’s not only useful when something breaks—it’s just as powerful when nothing is broken.

In traditional environments, upgrading a BNG node typically means disrupting users. Sessions must be torn down, software updated, and users reconnected afterward. Even with careful planning, maintenance windows inevitably translate into user impact. With stateful failover, upgrades become a controlled switchover. Traffic and subscriber state can be gracefully moved from one node to another, allowing the original node to be upgraded without disconnecting users. Once upgraded, it can rejoin the system and resume operation.

This effectively turns what used to be a disruptive maintenance activity into a hitless or near-hitless operation, significantly improving operational flexibility and reducing the need for maintenance windows.

Alignment with WT-474

Our approach to stateful failover is also aligned with industry direction, particularly Broadband Forum WT-474, which defines requirements and architecture for disaggregated BNG systems.

WT-474 emphasizes:

Separation of control and user plane
Resiliency and high availability mechanisms
Session continuity across failures and topology changes

Stateful failover is a natural fit within this framework. By preserving subscriber state and enabling seamless takeover, 5×9 vBNG directly supports the goal of maintaining uninterrupted service even as components fail or are upgraded.

In other words, this isn’t just a proprietary improvement—it’s part of a broader evolution of how broadband networks are designed.

Implementation Details

Under the hood, stateful failover in 5×9 vBNG is coordinated by the vBC (virtual BNG Controller), which maintains a global, real-time view of all subscriber sessions across the system. More details about the 5×9 vBNG architecture can be found on this link.

The vBC continuously tracks detailed per-subscriber state, including identifiers such as MAC address, assigned IP, the serving vBF (virtual BNG Forwarder), and the exact interface or attachment point. In other words, it doesn’t just know that a user is connected—it knows where and how that user is connected at any given moment. This centralized awareness is what enables fast and deterministic failover.

When a vBF becomes unavailable—whether due to failure or planned maintenance—the vBC detects the event immediately through its control-plane monitoring. Because it already holds the full subscriber context, it doesn’t need to reconstruct anything or wait for the user to reconnect. Instead, the vBC orchestrates a controlled migration:

Subscribers previously anchored to the failed vBF are reassigned to a backup vBF
Their IP address allocations and pool associations are preserved
MAC bindings and attachment information are migrated to the backup vBF
Forwarding state is re-established on the new vBF

All of this can happen without involving the end user or triggering a new session establishment, but it is possible to do it manually as well. If you have an upgrade in mind, click on our GUI and users will be transferred to another vBF instance of your choosing, simple as that.

Building Stateful Failover the 5×9 Way

Implementing stateful failover in a virtualized, cloud-native BNG environment isn’t trivial. It requires careful handling of synchronization, consistency, and performance.

At 5×9, we approached this by designing state replication as a first-class concern, not an afterthought. Subscriber state is continuously synchronized in real time, with strict guarantees around ordering and consistency. At the same time, we’re careful to keep this efficient—replicating only what’s necessary, without introducing bottlenecks. Failover detection and switchover are designed to be fast and deterministic, ensuring that takeover happens within tight bounds. And because the system is built to scale horizontally, redundancy isn’t limited to rigid active/standby pairs.

From Recovery to Non-Event

What’s interesting about stateful failover is that, when done right, it almost disappears as a concept. In traditional systems, failover is a visible event: something breaks, users disconnect, systems recover. In a stateful system, failover becomes a non-event. The system absorbs the failure and keeps going, without involving the user at all.

That shift—from recovery to continuity—is at the heart of why we built stateful failover into 5×9 vBNG. Because in modern networks, the best failure is the one your users never notice.

Stateful Failover in 5×9 vBNG

How Failover Used to Work

The Shift Toward Continuity

Control Plane Impact

Seamless Upgrades

Alignment with WT-474

Implementation Details

Building Stateful Failover the 5×9 Way

From Recovery to Non-Event

Take action

Simplify your maintenance, diagnostics and life, Today.

Request a demo

Get in touch

PRODUCTS

ABOUT

LEARN MORE

CONTACT