Master Linux Disk Usage: A Practical Guide to du and df

Stop guessing whats eating your disk—this practical guide to du and df explains how each tool measures storage, why their numbers sometimes differ, and exactly how to diagnose and reclaim space on your servers.

Managing disk usage is a routine yet critical task for system administrators, developers, and site owners. Two of the most fundamental utilities on Linux—du and df—provide complementary views into storage usage. Understanding how they work, when to use each, and how to diagnose discrepancies will save time, prevent outages, and help you optimize storage on VPS or dedicated servers. This practical guide dives into technical details, real-world scenarios, and best practices for using du and df effectively.

How du and df work: underlying principles

du (disk usage) reports the amount of space used by files and directories by walking the filesystem tree and summing file sizes. By default it reports the apparent size of files as stored in the filesystem’s metadata, but the behavior can be modified by flags to report on disk blocks actually allocated.

df (disk free) queries the kernel for filesystem-level usage statistics by reading information from the superblock and in-memory structures. It reports total, used, available space, and mount points. Because df reads filesystem metadata directly from the kernel, it reflects allocation from the filesystem perspective rather than a per-file traversal.

Key differences at a glance

Scope: du works on file trees; df reports per-filesystem totals.
Source: du sums individual file sizes; df reads filesystem counters from the kernel.
Performance: du can be slow on large trees because it traverses every file; df is fast.
Accuracy nuances: du may miss space held by deleted-but-open files or hidden metadata; df includes all allocations managed by the filesystem.

Practical options and examples for du

du has many flags that change what it measures and how results are presented. Important options include:

-h human-readable sizes (e.g. 1.2G).
-s summarize (show only a total for the argument).
–apparent-size report logical file size (what users see) rather than disk blocks allocated.
-b display bytes; useful for scripting with exact numbers.
–block-size=SIZE control the unit (e.g. –block-size=1M).
–max-depth=N limit recursion depth; great for quick tree overviews.
–exclude=PATTERN skip paths that match a pattern (handy for excluding virtual filesystems or cache directories).

Examples:

Show top-level directory sizes in readable format: du -h --max-depth=1 /var
Get exact bytes for scripting: du -sb /home/user
Report apparent sizes (e.g., sparse files): du --apparent-size -h large-sparse-file

Performance tips for du

Use --max-depth to limit traversal when you only need top-level summaries.
Run du on separate filesystems to avoid traversing network or big mount points unintentionally (or use -x to stay on one filesystem).
Use parallel tools (like GNU parallel) carefully: du is IO-bound; too much parallelism can worsen performance.
Consider specialized tools like ncdu or dust for interactive exploration; they use du-like traversal but provide fast, user-friendly interfaces.

Practical options and examples for df

df gives a quick view of filesystem usage and is essential for checking free space across mounts. Key options include:

-h human-readable.
-T show filesystem type (useful to spot tmpfs, overlayfs, or NFS mounts).
-i show inode usage (critical when you run out of inodes, not blocks).
–output=FIELDLIST customize columns for parsing in scripts.
–total show a grand total across all listed filesystems.

Examples:

Quick overview of all mounts: df -hT
Check inode exhaustion: df -i /var/www
Script-friendly output: df --output=source,fstype,size,used,avail,pcent,target

Common discrepancies between du and df and how to troubleshoot

It’s common to see du reporting significantly less usage than df. Typical causes include:

Deleted-but-open files: A process holds a file descriptor to a file that was deleted; df still counts the allocated blocks while du cannot see the file. Detect with lsof +L1 or lsof | grep deleted, then restart the process or truncate the file (: > /proc/PID/fd/FD).
Filesystem overhead and reserved blocks: Filesystems reserve blocks (e.g., ext4 reserves 5% by default) for root and performance; df’s used/avail reflects this. You can adjust reservation with tune2fs -m (for ext-family) or use filesystem-specific tools for XFS.
Metadata and journal size: df includes space used by journals, metadata, and allocation bitmaps that du won’t show because they aren’t regular files.
Bind mounts and mount namespaces: du traversing a path under a bind mount might double-count directories or skip allocations depending on how mounts are arranged. Use mount or findmnt to inspect mount topology.
Sparse files and apparent size: du by default reports the actual disk blocks allocated; use --apparent-size if you want logical sizes. df shows actual blocks used on disk.
Overlay and union filesystems (Docker, containers): Copy-on-write layers and overlay storage can result in confusing outputs — image layers are stored outside container-visible paths. Use container storage tools (e.g., Docker’s system df or inspect overlay directories) to reconcile.

Debugging checklist

Run df -hT to see mountpoints and filesystem types.
Run du -x --max-depth=1 -h / to avoid crossing filesystem boundaries.
Use lsof to find deleted-but-open files and the owning process.
Inspect inode usage with df -i if writes fail while blocks remain available.
Check for large hidden files in root directories like /var/log, /tmp, or container storage directories.

Application scenarios and recommended workflows

Different environments require different approaches. Here are some practical workflows:

Shared hosting and web server maintenance

Regularly run du -h --max-depth=2 /var/www to detect runaway logs, backups, or user uploads.
Monitor inode usage (df -i) — small files can exhaust inodes long before space runs out.
Exclude cache or session directories in du reports with --exclude or separate them onto different partitions to prevent a single tenant from filling the entire filesystem.

Containerized environments (Docker, Kubernetes)

Use df -hT on host to understand where overlay or device-mapper storage resides.
Monitor Docker’s disk use with docker system df and prune unused images and volumes periodically.
Consider dedicated volumes or separate block devices for heavy-write services (databases, caches) to isolate I/O and capacity.

Enterprise backups and large datasets

Prefer tools that report apparent sizes when planning transfers or backups (du --apparent-size), because sparse files can be much smaller on disk than their logical size.
When migrating volumes, use filesystem-aware tools (rsync with –sparse, or filesystem-level snapshots) to preserve sparse and reflink optimizations.

Advantages comparison and complementary tools

du and df are complementary:

Use df for a fast, global snapshot of free/used space and to check filesystem types and inode usage.
Use du for detailed per-directory analysis to find the biggest consumers of space.

Complementary tools worth knowing:

ncdu: Interactive, fast du-like explorer optimized for human use.
lsof: Find deleted-but-open files.
iotop/iostat for I/O performance; high I/O can indicate heavy write workloads that will consume space.
tune2fs/xfs_admin: Filesystem tuning (reserved blocks, behavior).
find + xargs: Script-based cleanup (e.g., find /var/log -type f -mtime +30 -print0 | xargs -0 rm).

Choosing the right VPS storage configuration

When selecting a VPS (virtual private server) or configuring storage, consider these factors:

Disk type: SSDs provide much better IO performance and lower latency than HDDs; for databases and high-traffic sites prefer SSD-backed plans.
Dedicated vs shared block devices: Dedicated volumes (or local NVMe) isolate noisy neighbors and improve consistency.
Filesystem and features: Modern filesystems (ext4, XFS, btrfs, ZFS) offer trade-offs. Choose based on snapshot needs, compression, or deduplication requirements.
IOPS and throughput: Match storage IOPS to expected workload; high-concurrency sites need higher IOPS quotas.
Backup and snapshot strategy: Regular snapshots and off-site backups avoid recovery headaches from full disks. Snapshots are space-efficient but still consume space over time as changes accumulate.

For users looking for reliable VPS options in the US, consider providers with transparent storage specs, SSD-backed disks, and snapshot features to simplify capacity management. See hosting examples and plans at VPS.DO and explore the specific USA VPS offerings at USA VPS for SSD options and clear disk allocations.

Conclusion

Mastering du and df is essential for any administrator or developer managing Linux systems. Use df for quick filesystem-level assessments and du for detailed file-tree analysis. Be aware of pitfalls like deleted-but-open files, sparse files, and filesystem reservations that cause discrepancies. Combine these tools with lsof, ncdu, and monitoring to create a robust workflow that prevents surprises and ensures capacity planning stays ahead of demand.

For environments hosted on VPS platforms, picking a provider with predictable, SSD-backed storage and flexible snapshot/backup options reduces the operational burden of disk management. Learn more about available plans and features at VPS.DO and check specific SSD-backed US locations at USA VPS.

Master Linux Disk Usage: A Practical Guide to du and df