How We Fixed a One-Core Packet Processing Bottleneck in Torrust Tracker

TL;DR

We found a production bottleneck where one CPU core was saturated by kernel packet work (softirq) while other cores still had spare capacity.

Current Load Snapshot

To keep this report actionable, we also captured a fresh live sample from the server while the service stayed healthy.

bash
echo "=== uptime ===" && uptime && echo "=== cpu ===" && mpstat -P ALL 1 1 && echo "=== memory ===" && free -h

Latest observed values:

Load average: 11.71 / 11.91 / 11.48 (also sampled at 12.16 / 11.99 / 11.50)
CPU2 softirq: 48.48%
All-CPU softirq: 29.63%
All-CPU idle: 18.52%
Memory available: 27 GiB of 30 GiB

This reinforces the same conclusion: the packet-processing hotspot remains controlled and distributed, while total host load remains high enough to justify separate capacity follow-up.

We ran two isolated experiments:

Disable HTTP/3 (QUIC) on Caddy by removing UDP 443.
Enable RPS/RFS to spread packet receive processing across all CPUs.

Results:

Disabling HTTP/3 did not improve the one-core softirq hotspot.
Enabling RPS/RFS reduced CPU2 %soft from ~100% to ~48.51%.
The improvement persisted at T+1h and T+next-day checkpoints.
Global host load remained high (11.83 / 11.59 / 10.82), so we treated this as a distribution fix, not a full capacity fix.

What Problem We Detected

The tracker host was under sustained CPU pressure. Standard process-level checks showed high utilization, but one pattern stood out: kernel networking work was concentrated on CPU2.

mpstat repeatedly showed CPU2 pinned in %soft.
ksoftirqd/2 appeared near the top CPU consumers.
User-space load was real, but the single-core kernel hotspot was the main anomaly.

For context, softirq is kernel-side packet processing work. If this work is not distributed well, one core can saturate even when the machine still has available cores.

Initial htop Snapshot (Before Patch)

This is the initial screenshot where CPU2 was pinned at 100% usage during the softirq hotspot.

Initial htop snapshot showing CPU2 pinned at 100 percent usage

Before patch: CPU2 was saturated, indicating concentrated packet-processing load.

Method: One Variable at a Time

To avoid ambiguous conclusions, we changed one thing per phase and compared snapshots using the same metrics:

mpstat -P ALL 1 1 for per-CPU softirq distribution.
ps -eo pid,comm,%cpu,%mem,stat --sort=-%cpu | head -20 for top CPU processes.
docker stats --no-stream for container-level CPU snapshots.
Prometheus HTTP/UDP request rates to keep traffic context comparable.
External endpoint checks from newtrackon.com/raw.

Phase 2: Disable HTTP/3 (QUIC)

Hypothesis: UDP 443 (HTTP/3) on Caddy might be adding packet-processing pressure. We removed only that UDP publish and restarted Caddy.

diff

ports:
  - "80:80"
  - "443:443"
- - "443:443/udp"

bash

docker compose up -d caddy

Selected checkpoint after this phase:

text

T+next-day (2026-05-05T06:16:14Z)
CPU2 %soft=98.02, %idle=1.98
caddy=308.89%, tracker=93.22%
HTTP1 rate=1909.11 req/s, UDP1 rate=2178.98 req/s
HTTP1/UDP1 external status: Working

Conclusion: disabling HTTP/3 was good hygiene, but it did not change the one-core softirq bottleneck.

Phase 3: Enable RPS/RFS

Next hypothesis: packet receive-side work was not being distributed across CPUs. Before the change, steering was effectively disabled.

text

net.core.rps_sock_flow_entries = 0
/sys/class/net/eth0/queues/rx-0/rps_cpus = 00
/sys/class/net/eth0/queues/rx-0/rps_flow_cnt = 0

We applied the live test change:

bash
sudo sysctl -w net.core.rps_sock_flow_entries=32768
echo ff | sudo tee /sys/class/net/eth0/queues/rx-0/rps_cpus
echo 4096 | sudo tee /sys/class/net/eth0/queues/rx-0/rps_flow_cnt

Immediate post-change snapshot:

text

Post-change (2026-05-05T06:55:40Z)
all CPUs: %soft=30.15
CPU2: %soft=48.51, %idle=9.90
other CPUs: %soft distributed across ~24-33%

caddy=411.88%, tracker=123.77%, mysql=9.67%
pre  HTTP=1912.99 req/s, UDP=2234.14 req/s
post HTTP=1926.15 req/s, UDP=2207.48 req/s
HTTP1/UDP1 external status: Working

At T+1h and T+next-day, CPU2 remained around 49.48-49.49% softirq and load stayed distributed. This confirms RPS/RFS removed the one-core packet hotspot.

htop Snapshot After Patch

After enabling RPS/RFS, load was distributed across CPUs instead of being concentrated on CPU2.

htop snapshot after RPS and RFS patch showing load distributed among CPUs

After patch: packet-processing pressure is spread across cores.

Why This Did Not Fully Solve Host Load

RPS/RFS solved distribution, not demand. The host still ran with high global load averages after the fix. In other words, the previous failure mode (single-core saturation) was mitigated, but total workload pressure remained near capacity.

Even with high server load, observed tracker availability on NewTrackon stayed above 99%. This is an important distinction: high load reduced headroom, but service quality remained strong during the observation window.

udp://udp1.torrust-tracker-demo.com:6969/announce: latency 21 ms
https://http1.torrust-tracker-demo.com:443/announce: latency 29 ms

At peak, we are handling about 1200 UDP req/s and a little over 2200 HTTP req/s.

Traffic Over the Last 15 Days

This Grafana Generic Tracker Dashboard screenshot shows announce requests per second over the last 15 days for both UDP1 and HTTP1 trackers.

Grafana dashboard showing announce requests per second for UDP1 and HTTP1 over the last 15 days

Last 15 days: announce request rate trends for udp1.torrust-tracker-demo.com and http1.torrust-tracker-demo.com.

This is an important operational lesson: eliminating one bottleneck can improve stability and fairness across CPUs without creating enough long-term headroom for sustained growth.

Timeline Checkpoints

Before changes: CPU2 repeatedly near 100% softirq.
After HTTP/3 disable: no meaningful improvement in CPU2 softirq.
Immediate after RPS/RFS: CPU2 softirq dropped to 48.51%.
T+1h: distribution remained stable, CPU2 around 49.48%.
T+next-day: distribution persisted, CPU2 around 49.49%.

Final Operational Decision

Keep RPS/RFS enabled permanently on the current host.
Close the tuning scope in ISSUE-29 as completed.
Track capacity follow-up separately (scale-up planning in ISSUE-30).

The practical outcome is clear: packet-path tuning fixed the single-core bottleneck, and capacity planning becomes the next lever for sustained growth.

References

You are reading our most recent post.

PostgreSQL Support in Torrust Tracker

How We Fixed a One-Core Packet Processing Bottleneck in Torrust Tracker

TL;DR

Current Load Snapshot

What Problem We Detected

Initial htop Snapshot (Before Patch)

Method: One Variable at a Time

Phase 2: Disable HTTP/3 (QUIC)

Phase 3: Enable RPS/RFS

htop Snapshot After Patch

Why This Did Not Fully Solve Host Load

Traffic Over the Last 15 Days

Timeline Checkpoints

Final Operational Decision

References

You are reading our most recent post.

Related Posts:

How We Fixed a One-Core Packet Processing Bottleneck in Torrust Tracker

PostgreSQL Support in Torrust Tracker

How nf_conntrack Overflow Causes Intermittent UDP Tracker Downtime with Docker