Ubuntu Server Performance Tuning Guide – Deep Technical Focus

By VPS.DO
February 4, 2026

This guide emphasizes in-depth kernel, subsystem, and workload-specific optimizations for Ubuntu Server 24.04 LTS and newer point releases. It assumes you are already familiar with basic tools (tuned, sysctl, cpupower, perf) and prioritizes understanding underlying mechanisms over copy-paste snippets.

1. Scheduler & Preemption Behavior

The default PREEMPT_VOLUNTARY kernel balances responsiveness and throughput. For latency-critical workloads (databases, message queues, soft real-time), switch to full preemption:

Kernel command line: preempt=full
Effect: voluntary preemption points are supplemented with forced preemption at nearly every scheduling opportunity
Trade-off: ~2–8% higher context-switch rate, measurable increase in scheduler overhead under very low load
Complementary: threadirqs (move softirqs and most IRQs to threads) reduces tail latency spikes during network or disk bursts

On large-core-count systems (AMD EPYC, Intel Sapphire Rapids / Emerald Rapids), also evaluate:

skew_tick=1 → desynchronizes timer ticks → reduces cacheline bouncing in idle polling paths
nohz_full=1-63 (example for cores 1–63) → completely tickless operation on isolated cores

2. Transparent Huge Pages (THP) Strategy

THP behavior directly affects TLB miss rate and memory allocation latency.

Three modes:

always Kernel aggressively collapses 4 KiB pages into 2 MiB (or 1 GiB on newer hardware). Best for: large sequential access patterns, in-memory databases (PostgreSQL shared_buffers, Redis), JVM heaps with G1/ZGC/Shenandoah. Risk: allocation stalls during collapse, especially under memory pressure.
madvise (increasingly recommended default in enterprise distros) Only huge pages where application explicitly requests via madvise(MADV_HUGEPAGE). PostgreSQL ≥15, Redis (with THP defrag disabled), many Java runtimes already use this correctly. Safest compromise for mixed workloads.
never Disable completely. Use only when measuring consistent regression with madvise/always.

Practical tuning path:

Set transparent_hugepage=madvise at boot
For applications that benefit: ensure they set MADV_HUGEPAGE (most do)
Monitor: cat /proc/meminfo | grep -i AnonHugePages and /proc/vmstat (thp_fault_alloc, thp_collapse_alloc)

3. Memory Reclaim & Dirty Page Writeback Dynamics

Server-oriented memory pressure handling differs significantly from desktop defaults.

Key tunables and their mechanical effects:

vm.swappiness = 1–10 Controls how aggressively the kernel swaps anonymous memory vs. page cache reclaim. Very low values protect application working sets at the cost of potentially evicting more cache.
vm.dirty_background_ratio / vm.dirty_ratio Percentage of total memory that can be dirty before background (pdflush) or direct writeback starts. Lower values → more frequent but smaller write bursts → better tail latency on flash. Higher values → larger coalesced writes → better sequential write throughput.
vm.dirty_expire_centisecs & vm.dirty_writeback_centisecs Control aging of dirty pages. Tightening these reduces risk of sudden I/O storms when memory pressure triggers massive writeback.

Modern recommendation for SSD/NVMe servers with ≥64 GiB RAM:

swappiness = 5–10
dirty_background_ratio = 2–5
dirty_ratio = 8–15
dirty_expire_centisecs = 1000–2000
dirty_writeback_centisecs = 500

4. TCP Stack Deep Tuning (High Concurrency + Bandwidth-Delay Product)

Critical subsystems that determine sustained connections and goodput:

Congestion control: BBRv2 (kernel ≥5.4) remains dominant for most internet-facing and intra-DC traffic Advantages over CUBIC: better recovery from loss, lower queueing delay, fairness with drop-tail queues
Queue disciplines: fq (Fair Queue) + pacing Prevents bufferbloat and provides per-flow fairness even without ECN
SYN backlog & listen queue: tcp_max_syn_backlog should be significantly larger than somaxconn When application accept() queue fills → kernel silently drops new SYNs (no RST)
Ephemeral port exhaustion avoidance: ip_local_port_range widened + tcp_tw_reuse + very low tcp_fin_timeout
Memory pressure handling: Autotuning (tcp_rmem / tcp_wmem) works well up to ~16–32 MiB per socket Beyond that, static large values prevent mid-connection stalls

5. Block Layer & NVMe Specifics

Modern NVMe behavior is governed by:

mq-deadline → kyber → none progression (none usually wins on pure NVMe since ~5.15)
Very high nr_requests (1024–4096) allows deeper internal command queues
Read-ahead tuning: lower (32–128 KiB) for random I/O dominant workloads, higher (512–2048 KiB) for streaming
I/O priority & cgroup blkio.weight: increasingly relevant in container-dense environments

6. Workload-Tailored Patterns (Summary)

Web frontends (NGINX / Envoy / Traefik) epoll, multi_accept, large accept queue, BBR + fq, THP madvise
Relational databases (PostgreSQL, MariaDB) THP always or madvise + application huge page support, very low swappiness, O_DIRECT + large dirty ratios, tuned wal_buffers & checkpoint_completion_target
In-memory stores (Redis, Memcached, KeyDB) THP never/madvise + disable transparent defrag, overcommit_memory=1, aggressive swappiness avoidance
Container / Kubernetes nodes cgroup v2 memory + cpu + io controllers, netfilter bridge settings disabled, hugepages-2Mi allocation via kubelet

Measurement & Validation Layers

Use layered observability:

perf stat -d -a sleep 10 → broad counters
bpftrace scripts for softirq latency, TCP retransmit, block I/O completion histograms
sysdig, bcc tools (biolatency, runqlat, offcputime)
Application-specific: pg_stat_statements, Redis INFO latency, NGINX stub_status + Prometheus

Apply one category of change at a time, run production-representative load tests (wrk2, locust, sysbench, fio –randrepeat=0), compare p50/p95/p99 latencies, throughput, and CPU efficiency.

If you can describe your dominant workload, CPU generation, memory size, storage type (local NVMe vs networked), and primary bottleneck metric (latency tail, throughput plateau, CPU saturation, etc.), significantly more precise recommendations become possible.

Ubuntu Server Performance Tuning Guide – Deep Technical Focus