Skip to main content

Introduction: The Challenge of Large-Scale Monitoring

As our Lightweight Monitoring System (LMS) and TWAMP deployments expand beyond IP/MPLS core, edge, and access routers, we are now extending coverage to a much larger portion of the Access Node infrastructure. What started as monitoring hundreds of routers has grown exponentially to cover thousands or even tens of thousands of mobile and fixed Access Nodes.

However, this scalability introduces a critical challenge – each monitoring probe must now handle an increasingly high packet rate, exposing limitations in Linux kernel-based forwarding.

Initially, Linux kernel-based forwarding was sufficient. But as traffic increased, it became a bottleneck, reducing CPU-based timestamping precision—a crucial factor for accurate network performance measurement. Since 5×9 LMS runs exclusively on COTS hardware, we needed a solution that delivers high performance without specialized hardware.

To tackle this, 5×9 Networks leverages the expertise of two highly specialized teams:
• The High-Performance vBNG Team: Focused on virtual Broadband Network Gateway (vBNG) technology, achieving 200Mpps and 1Tbps+ performance per 2RU server, specializing in high-speed packet processing and forwarding.
• The Monitoring Team: Develops an advanced active monitoring system, supporting 30+ measurement modules across multiple access technologies, running on both physical and virtual form factors, optimized for COTS hardware.

The 5×9 TWAMP Client – The Powerhouse of Network Monitoring

To measure network delay, packet loss, and jitter, the 5×9 LMS relies on our state-of-the-art, in-house developed TWAMP client measurement module. Designed for continuous (never-stop) mode, this module delivers precise, real-time insights:
✔ Round-trip delay, packet loss, and jitter
✔ One-way delay and jitter, when paired with a time-synchronized TWAMP reflector
✔ One-way loss, automatically triggered when two-way packet loss is detected, ensuring precise fault localization (requires a 5×9 TWAMP reflector)
✔ SLA breach detection and availability, based on predefined SLA thresholds

Additionally, the 5×9 TWAMP client and reflector support hundreds of randomized source and destination UDP ports, enabling diverse L4 hashing for effective performance monitoring across large ECMP link bundles.

Identifying the Problem: Linux Kernel Bottlenecks

As packet rates increase, we observed a growing offset in CPU-based timestamping. However, despite the higher traffic volume, the application handling packet processing remained nearly idle in terms of CPU utilization.

This pointed to a clear issue – the Linux kernel was the bottleneck.

While the kernel successfully handled the required number of packets, it introduced forwarding jitter (delay variance), which negatively impacted timestamping accuracy and measurement precision.

Our lab tests confirmed this limitation, revealing:
10kpps constant packet rate
• 200 unique TWAMP destinations
• 5 QoS classes per destination
• 10pps per QoS class
• 150-byte packet size
• Max timestamp offset exceeding 2ms

At this point, it became clear that kernel-based forwarding was no longer viable if we wanted precise, high-performance monitoring.

Overcoming the Bottleneck: Bypassing the Linux Kernel

To achieve precise CPU-based timestamping at high packet rates, we needed a solution that would completely bypass the Linux kernel.
The answer? A seamless collaboration between our vBNG and Monitoring teams.
What We Did:
✅ Step 1: The vBNG Team developed a DPDK-based forwarding and timestamping engine, which bypasses the Linux kernel to maximize performance.
✅ Step 2: The Monitoring Team modified the TWAMP client code, ensuring that packets are sent and received directly through the DPDK-based engine, avoiding kernel-induced delays.
✅ Step 3: The overall TWAMP client functionality and result processing remained unchanged, ensuring full backward compatibility for module configuration, alerting, and visualization.
This approach allowed us to dramatically reduce forwarding jitter, restoring the timestamping accuracy required for large-scale monitoring.

Test Results – A Massive Performance Boost!

Our initial lab tests with the DPDK engine confirmed a 20× performance boost on the same hardware, along with significantly improved timestamping precision:
200kpps constant rate
• 4,000 unique TWAMP destinations
• 5 QoS classes per destination
• 10pps per QoS class
• 150-byte packet size
• Max timestamp offset <50µs under full load

Before vs. After
🚀 Kernel-based forwarding: Max timestamp offset >2ms at just 10kpps
🚀 DPDK-based forwarding: Max timestamp offset <50µs at 200kpps

This 20× performance improvement with increased timestamping precision is a game-changer, proving that kernel bypassing was the right solution.

The Best Part

This 200kpps monitoring setup runs on a sub-600Eur 10x10cm Single Board Computer (SBC) powered by an AMD Ryzen Embedded V1605B CPU with just 4 cores/8 threads!

Final Thoughts

By leveraging DPDK for high-performance packet handling, we’ve unlocked a new level of efficiency in our TWAMP deployments.

📌 Key Achievements:
✔ Eliminated the need for additional hardware
✔ Maintained high precision and accuracy
✔ Enabled ultra-high-speed monitoring on COTS hardware

At 5×9 Networks, we’re not just keeping up with demand—we’re pushing the limits of performance and redefining active network monitoring. 🚀

Author: Mario Jurcevic, 5×9 Networks