Ubuntu Server Network Troubleshooting – Deep Technical Focus

Ubuntu Server Network Troubleshooting – Deep Technical Focus

By VPS.DO
February 8, 2026

Network issues on Ubuntu Server often stem from subtle interactions between kernel networking stack, systemd-resolved, Netplan’s declarative model, systemd-networkd backend behavior, and modern security/default hardening features. This guide prioritizes conceptual understanding and diagnostic reasoning over configuration snippets, targeting experienced administrators working on 24.04 LTS and later releases.

1. Understanding the Modern Ubuntu Networking Stack Layers

Kernel netdev layer Physical/link state, carrier detection, ethtool-negotiated speed/duplex, offload features (TSO, GSO, GRO, checksum), RSS/indirection table, interrupt coalescing and napi weight. Misbehavior here usually manifests as NO-CARRIER, link-flapping, or extremely poor throughput despite correct IP configuration.
systemd-networkd Manages link configuration, DHCPv4/v6 client, static addressing, routes (including policy routing), neighbor tables, and link-local addressing. It operates asynchronously and can fail silently if carrier never appears or DHCP server is unreachable during critical boot phases.
Netplan Purely a frontend: it translates YAML into backend-specific files (/run/systemd/network/*.network, *.netdev). Critical behaviors: – renderer mismatch (networkd vs NetworkManager) – optional: true vs false impacting systemd-networkd-wait-online – match: clauses using predictable interface names (enpXsY, ens3) vs legacy eth0 – activation-mode (off, manual, auto) controlling when links are brought up
systemd-resolved Local stub resolver (127.0.0.53:53), per-link DNS configuration, DNSSEC validation, split-DNS via search domains and routing domains, DoT/DoH support (experimental in recent releases). Most name resolution failures trace back to this daemon rather than /etc/resolv.conf (which is now a controlled symlink).

2. Systematic Layered Diagnosis Approach

Layer 1 – Physical & Data-Link

Carrier sense failure: kernel dmesg | grep -i eth.*link
Speed/duplex negotiation problems: ethtool shows different values on both ends → autoneg off or cable category mismatch
Offload conflicts: TSO/GSO/GRO bugs with certain NIC drivers (igc, r8169, mlx5) → disable via ethtool -K
Multi-queue / RSS imbalance: single flow pinned to one queue → poor multi-core scaling

Layer 2/3 – Addressing & Routing

DHCP transaction visibility: journalctl -u systemd-networkd -g DHCP Look for REQUEST → OFFER → ACK sequence, lease renewal failures, NAK responses
Static IP misapplication: conflicting addresses from cloud-init, old ifupdown configs, or duplicate netplan files
Route priority & metric conflicts: multiple default routes with equal metrics → unpredictable forwarding
Neighbor (ARP/ND) resolution failures: incomplete entries in ip neigh show → possible MTU mismatch, proxy ARP issues, or firewall dropping gratuitous ARP

Layer 3.5 – Firewall & NAT

nftables (default since 22.04) vs legacy iptables: rules may drop early in INPUT/forward chain
ufw status verbose shows effective policy but not actual nft rules
conntrack table exhaustion under high connection rate → nf_conntrack_count near nf_conntrack_max

Layer 4+ – Transport & Application

TCP congestion control mismatch (BBR vs CUBIC) on high BDP paths
ECN blackholing → fallback to non-ECN slows recovery
SYN cookies triggered → indicates listen backlog overflow or spoofed traffic
Socket buffer pressure → autotuning hits ceiling → poor goodput

3. High-Impact Diagnostic Techniques

Packet-level visibility without capture ss -K dst 8.8.8.8 → show sockets attempting to reach destination ss -s → summary of socket states (many TIME_WAIT = port exhaustion risk)
Per-interface statistics deep dive ip -s -s link show dev enp1s0 → RX/TX errors, dropped, overruns, collisions ethtool -S enp1s0 | grep -i drop → driver-level drops (very common on virtio-net)
DNS debugging without dig/nslookup resolvectl query –cache=no google.com → bypass cache resolvectl flush-caches; resolvectl statistics → see cache hit rate & upstream failures systemd-resolved –dump-state → full internal state dump
Boot-time network ordering issues systemd-analyze critical-chain systemd-networkd-wait-online.service Long delays almost always trace to wait-online timeout when optional: false on optional interfaces

4. Frequent Root Causes in Production (2025–2026 Observations)

Cloud-init + Netplan race → stale /etc/netplan/50-cloud-init.yaml overrides user config
systemd-resolved DoT fallback failure on captive portals or broken upstream resolvers
Predictable interface naming disabled → old configs reference eth0 instead of enpXsY
Kernel driver regression after point release (common with igc, r8125, ixgbe)
MTU mismatch on VXLAN/Geneve/GRE tunnels or jumbo-frame enabled switches
nf_conntrack_buckets too small for NAT-heavy workloads → hash collisions → early drops

5. Resolution Patterns

Prefer netplan try over netplan apply during debugging
Use match.macaddress: or match.perm MAC when renaming interfaces
For DNS issues: set global fallback DNS in resolved.conf, disable DNSSEC if upstream strips signatures
For carrier detection problems: decrease carrier-wait timeout or use RequiredForOnline=no
When performance is poor but connectivity exists: force congestion control to bbr, increase somaxconn, tune tcp_low_latency

Start diagnosis at the lowest layer that shows abnormality (link state → addressing → routing → resolution → transport). Once you identify which layer fails, 80% of problems become trivial to resolve.

If you can describe the exact symptom pattern (e.g. “DHCP never completes”, “DNS resolves intermittently”, “link up but no traffic passes”, “boot hangs 2 minutes on network”), outputs from ip -c addr, resolvectl status, journalctl -u systemd-networkd -b, or your environment type (bare metal, KVM, cloud provider), far more targeted reasoning can be applied.

Ubuntu Server Network Troubleshooting – Deep Technical Focus