Characterizing the Impact of Active Queue Management on Speed Test Measurements

Source: arXiv:2511.19213 · Published 2025-11-24 · By Siddhant Ray, Taveesh Sharma, Jonatas Marques, Paul Schmitt, Francesco Bronzino, Nick Feamster

TL;DR

This paper studies how Active Queue Management (AQM) algorithms influence modern speed test measurements, specifically metrics related to latency under load and throughput variability. While traditional speed tests primarily report peak throughput and average latency metrics, newer tools have introduced latency-under-load metrics to better capture user-perceived network responsiveness. However, how these metrics behave in the presence of common AQMs such as CoDel, FQ-CoDel, and Stochastic Fair Queuing (SFQ) is not well understood. Using a controlled lab testbed with configurable bottleneck bandwidth and competing traffic, the authors run repeated NDT speed tests with different AQMs and load conditions to measure detailed throughput distributions and latency dynamics.

The main finding is that AQM algorithms significantly affect speed test results, especially the variability and tails of latency and throughput distributions under load. For example, FQ-CoDel and SFQ produce more stable instantaneous throughput and lower latency spikes under competing cross traffic than no AQM or drop-tail baseline. Burst shaping of traffic also strongly impacts achievable throughput measurements. The work highlights that common speed test tools’ aggregate statistics can mask these important dynamics and that careful calibration to AQM policies is needed for interpretable network performance measurements. This work bridges the gap between AQM deployment in real networks and how speed tests capture user experience, suggesting refinements for measurement methodologies.

Key findings

Throughput measurements without burst shaping under-utilize link capacity; e.g., at 800 Mbps bandwidth, throughput maxes out below 700 Mbps across AQMs (Fig 1).
With burst shaping enabled and no cross traffic, throughput closely matches configured bandwidth limits across AQM algorithms (Fig 3).
Introducing competing TCP Cubic cross traffic reduces throughput significantly and increases variability, e.g., for FQ-CoDel at 900 Mbps bandwidth, throughput standard deviation rises from 1.51 Mbps without cross traffic to 138 Mbps with cross traffic (Fig 5).
Latency measurements with competing cross traffic increase dramatically due to queueing delays from load (Fig 6), varying across AQM algorithms.
Instantaneous throughput under low load scenarios is stable across AQMs but shows significant variability at high bandwidth under load, especially without efficient AQM (Figs 7, 8, 9).
FQ-CoDel and SFQ maintain more stable throughput at lower bandwidth under competing load than No AQM or CoDel (Fig 8).
Latency spikes and throughput variability during speed tests depend heavily on the deployed AQM scheme and load conditions, not just mean values.
Measurements indicate that traditional speed test metrics based on averaged values can mask significant dynamics relevant for real-time applications.

Methodology — deep read

The authors set up a controlled lab testbed with two System76 Meerkat 6 mini PCs connected by a Turris Omnia router capable of 1 Gbps Ethernet. The router runs Linux with various AQM algorithms: no AQM (pfifo drop-tail), CoDel, FQ-CoDel, and SFQ. Rate limits are applied via Hierarchy Token Bucket shaping from 100 Mbps to 1 Gbps in 100 Mbps steps, with optional burst shaping to allow transient bursts.

They run repeated (10x) speed tests using NDT (Measurement Lab's Network Diagnostic Tool) and iperf3 TCP tests from one host to the other, logging full measurement data including pcap captures for throughput analysis. Latency is measured as round-trip time and includes latency under load metrics. Cross traffic is introduced as TCP Cubic iperf3 flows to simulate competing bandwidth demand.

Collected data includes aggregate throughput, latency, and instantaneous throughput time series over the test duration. They compare metrics across AQM algorithms under different bandwidth caps, burst shaping enabled or disabled, and presence or absence of cross traffic.

By sweeping these conditions systematically, they quantify how AQM schemes and network load affect both the magnitude and variability of throughput and latency measurements. The use of a reproducible isolated testbed isolates external noise and isolates variable impacts.

A concrete example: At 900 Mbps link rate limit with burst shaping and TCP Cubic cross traffic, NDT throughput variance under FQ-CoDel was a standard deviation of 138 Mbps, compared to 1.51 Mbps without cross traffic, showing significant load-induced instability dependent on AQM.

Evaluation relies on statistical distributions, time series plots, and comparison of mean and tail percentiles for metrics, highlighting that simple averages do not reveal important variability. Their approach reveals nuanced performance dynamics relevant to user-perceived responsiveness.

Code and exact scripts are not explicitly confirmed as released. The testbed and software components used (NDT, Linux tc, standard AQMs) are publicly accessible, supporting potential reproducibility.

Technical innovations

Empirical characterization of the impact of popular AQM algorithms on detailed statistical distributions of speed test throughput and latency measurements under varying network load.
Systematic evaluation differentiating effects of burst shaping, cross traffic, and link bandwidth on speed test metric stability and accuracy.
Highlighting the discrepancy between traditional mean throughput/latency metrics and the underlying variability and tail events relevant for user Quality of Experience.
Use of instantaneous throughput time series at different bandwidth and loading conditions to reveal AQM-dependent measurement stability.

Datasets

Lab testbed generated throughput and latency measurement datasets — ~10 runs per configuration with NDT and iperf3 — non-public

Baselines vs proposed

No AQM (drop-tail): throughput standard deviation at 900 Mbps with cross traffic = ~138 Mbps vs FQ-CoDel = similar range but better latency stability
Bandwidth utilization at 800 Mbps: without burst shaping ranges below 700 Mbps vs with burst shaping close to configured limit (Fig 1 vs Fig 3)
Latency at 900 Mbps with cross traffic rises to ~7-8 ms under AQM vs much lower under no load (Fig 6)

Limitations

Experimental setup is a controlled lab testbed with two hosts and a single router, which may not capture all complexities of wide-area Internet paths.
Cross traffic limited to TCP Cubic iperf3 flows; other transport protocols or traffic mixes not explored.
Latency shaping was disabled in final results due to implementation instability, so effects of added fixed latency not quantified.
Results focus on a limited set of AQM algorithms and default parameters; other AQMs or tuned configs may behave differently.
No adversarial or malicious traffic patterns studied; the threat model assumes standard benign congestion scenarios.
Code and exact data release status unclear, limiting straightforward reproduction.

Open questions / follow-ons

How do additional AQM algorithms (e.g., CAKE, PIE, L4S) impact speed test metrics under similar conditions?
What is the impact of mixed traffic types or adversarial congestion on speed test variability and metric reliability?
Can speed test tools be redesigned to report more informative metrics that explicitly account for AQM effects and load variability?
How do observed lab testbed results translate to measurements on real wide-area networks with complex routing and cross traffic?

Why it matters for bot defense

For bot-defense and CAPTCHA practitioners, this study highlights that network measurement tools can report significantly different latency and throughput characteristics depending on underlying network queue management policies. Emerging latency-under-load metrics, which better approximate user experience during interactive applications, are sensitive to AQM deployments such as FQ-CoDel and SFQ that reduce bufferbloat.

This implies that CAPTCHA systems or bot-defense mechanisms relying on speed or responsiveness measurements for user fingerprinting should consider the underlying network conditions and queue management algorithms to avoid misclassification due to natural variability. Interpreting speed test or network measurement data in isolation without accounting for AQM effects may produce unreliable signals about client network health or behavior. Designing detection systems that incorporate knowledge of AQM impact could improve robustness against false positives or negatives triggered by transient network queuing artifacts.

Cite

bibtex

@article{arxiv2511_19213,
  title={ Characterizing the Impact of Active Queue Management on Speed Test Measurements },
  author={ Siddhant Ray and Taveesh Sharma and Jonatas Marques and Paul Schmitt and Francesco Bronzino and Nick Feamster },
  journal={arXiv preprint arXiv:2511.19213},
  year={ 2025 },
  url={https://arxiv.org/abs/2511.19213}
}

Characterizing the Impact of Active Queue Management on Speed Test Measurements ​

TL;DR ​

Key findings ​

Methodology — deep read ​

Technical innovations ​

Datasets ​

Baselines vs proposed ​

Limitations ​

Open questions / follow-ons ​

Why it matters for bot defense ​

Cite ​

Read the full paper ​