Keep Your VPS Healthy: Monitor Disk Usage & Clean Logs Efficiently
Keep your VPS running smoothly by learning how to monitor disk usage and stop runaway logs from degrading performance. This practical guide helps site owners and ops identify growth culprits and set up efficient rotation and cleanup so your server stays responsive.
Introduction
Keeping a VPS responsive and reliable depends heavily on disk health and disciplined log management. Disk full events and runaway logs are some of the most common causes of service degradation for websites, databases, and containerized applications. This article gives a practical, technical guide for site owners, developers, and operations engineers on how to monitor disk usage, identify root causes of growth, and implement efficient log rotation and cleanup strategies to keep a VPS healthy.
Understanding Disk Usage and Why It Matters
Disk usage is more than a percentage on a dashboard. It impacts system performance, process ability to write temporary files, database operations, and even boot behavior. Two concepts are key:
- Capacity usage: the percentage of total available space used (what df reports).
- Inode usage: the count of filesystem metadata entries used (what df -i reports).
A disk can be “full” in one of these dimensions while the other still looks fine. For example, a filesystem with millions of tiny files can exhaust inodes and make further file creation impossible even if there is free space left. Monitoring both is essential.
Common causes of disk growth
- Accidental log verbosity or debug-level logging left enabled in production.
- Unrotated logs accumulating over time.
- Application-level caches or temp files that are not cleaned.
- Database binary logs, backup snapshots, or container image layers building up.
- Large file uploads or user content stored without quotas.
Tools and Commands to Inspect Disk Usage
On any Linux VPS, use the following commands to get immediate visibility:
- df -h : reports mounted filesystems and percent used in human-readable form.
- df -i : reports inode utilization per filesystem.
- du -sh /path/to/dir : gives a summary size for a given directory.
- du -h –max-depth=1 /var | sort -h : helps discover which subdirectories are largest.
- find /var/log -type f -printf ‘%s %pn’ | sort -nr | head -n 20 : finds largest files under /var/log.
- lsof +L1 : lists deleted files that are still held open by processes (common cause of “used” space with no visible file).
These commands are quick diagnostics; integrate them into scripts or monitoring checks for continuous observability.
Strategies for Efficient Log Management
Logs are invaluable but grow without bounds if unmanaged. Implementing robust rotation, compression, and retention policies reduces disk pressure while preserving necessary forensic data.
Use logrotate effectively
logrotate is a standard utility on most distributions. Key configuration knobs:
- rotate N — keep N rotations.
- daily / weekly / monthly — rotation frequency.
- compress / delaycompress — enable gzip compression to save space.
- missingok — don’t error if a file is missing.
- notifempty — skip rotation for empty logs.
- postrotate / endscript — run service reload commands (e.g., systemctl kill -HUP) after rotation.
Example approach: rotate most web and app logs daily and keep 7 compresses; rotate audit logs weekly and keep 12. For high-volume logs (access logs with hundreds of MB/day) rotate hourly or use piping to a log aggregator.
Centralized logging and offloading
Move log retention off the VPS by shipping logs to a central system (ELK/EFK, Graylog, Papertrail, hosted SaaS). This reduces local disk usage and improves searchability. Use lightweight forwarders like rsyslog, fluent-bit, or filebeat with backpressure handling and TLS to protect delivery.
Managing systemd journal
If your VPS uses systemd, the journal can consume significant space. Configure /etc/systemd/journald.conf with sensible limits:
- SystemMaxUse=50M or 100M — cap total journal size.
- MaxRetentionSec=1month — limit retention by time.
- SystemKeepFree=50M — ensure free space is preserved for other uses.
Use journalctl –vacuum-size=100M or –vacuum-time=3d for on-demand cleanup.
Automating cleanup for temporary and cache directories
Implement scheduled cleanup for /tmp, /var/tmp, and application caches. Two approaches:
- tmpreaper or tmpwatch: remove files older than a configurable age.
- Cron scripts: find /tmp -type f -mtime +3 -delete to remove files older than 3 days (adjust to fit application needs).
Be careful to avoid removing files used by long-running processes; prefer age-based deletion and test scripts on staging systems first.
Preventative Monitoring and Alerting
Proactive alerts are better than reactive firefighting. Define actionable thresholds and integrate them with an alerting route (email, Slack, PagerDuty).
Key metrics to monitor
- Filesystem percent used and free space (warn at 70–80%, critical at 90–95%).
- Inode usage percent (critical if >90%).
- Rate of increase (MB/hour or files/hour) for critical partitions like /var, /home, /var/log.
- Number of large files created recently and their owners/processes.
Monitoring tools and approaches
- Prometheus node_exporter exposing node_filesystem_avail_bytes and node_filesystem_files_free; pair with Grafana dashboards to visualize trends.
- Traditional monitoring like Nagios, Zabbix, or Icinga for simple threshold-based alerts.
- Agentless scripts via cron that run the aforementioned du/find commands and push results to a central collector or alert system.
Use alert rules that consider both absolute thresholds and the slope of change to catch sudden growth (e.g., a log spike due to an application error). Include the offending process or log path in the alert payload to speed remediation.
Troubleshooting Full Disk Scenarios
When a disk is full, follow an ordered, low-risk approach to recover quickly without causing service disruptions.
Quick triage steps
- Run df -h and df -i to confirm the problem dimension.
- Check for large deleted but open files with lsof +L1; restart the owning service to release space if safe.
- Inspect /var/log and /tmp for the largest files using find and du; remove or truncate logs only after confirming they are rotated or not needed.
- Temporarily move less critical files to another mount or remote server if immediate space is needed (mv to an NFS mount or scp to another host).
When truncating a log, use > logfile to zero it or : > /var/log/huge.log. Prefer restarting or signalling the daemon after truncation to avoid log write failures.
Filesystem and Partition Considerations
Choosing the right filesystem and partition layout affects how you respond to disk pressure:
- Use separate partitions for /, /var, and /home so logs and user data cannot fill the root partition and prevent critical system functions.
- Consider LVM for flexible resizing—grow logical volumes and filesystem online when more space is needed (ext4 and xfs both support online grow; xfs requires xfs_growfs).
- For performance and scale, evaluate filesystems like XFS for large files and ext4 for general purpose. Keep inode allocation in mind at filesystem creation time (mkfs options specify inode ratio).
Advantages of a Disciplined Disk & Log Policy
Implementing monitoring and disciplined cleanup yields multiple benefits:
- Higher uptime: fewer service outages caused by “disk full” errors.
- Faster incident response: alerts that include root-cause indicators reduce mean time to resolution.
- Lower storage costs: compressing logs and offloading to central systems reduces local disk requirements.
- Better security and compliance: retention policies ensure logs are kept as long as needed for audits and then removed.
Practical Implementation Checklist
- Configure logrotate for all application and system logs with compression and retention policies.
- Cap systemd journal size and/or implement vacuuming periodically.
- Automate tmp and cache pruning for non-essential directories with careful age checks.
- Instrument disk and inode metrics in your monitoring stack and set slope-based alerts.
- Partition sensibly or use LVM for future-proof flexibility.
- Plan offsite log shipping if long-term retention or analysis is required.
Choosing a VPS with Disk Reliability in Mind
When selecting a VPS provider, consider disk performance, snapshot and backup capabilities, and the ability to resize volumes or add additional volumes easily. Look for:
- SSD-backed storage for predictable I/O performance.
- Ability to attach additional block storage or increase disk size without downtime (LVM-friendly).
- Automated backup options and snapshot frequency configurable from the control panel.
- Transparent IOPS and bandwidth limits—these affect how quickly logs and database data can be written.
Summary
Maintaining a healthy VPS requires more than periodic checks. Monitor both capacity and inode usage, enforce thoughtful log rotation and retention, cap systemd journal growth, and automate temporary file cleanup. Integrate these practices into your monitoring and alerting to detect growth trends early and avoid service-impacting incidents. With partitioning and volume flexibility, plus centralized log shipping for long-term analysis, you can keep your environment resilient and performant.
If you’re evaluating VPS providers that make it easy to manage storage and scale, consider checking out VPS.DO’s offerings — particularly their USA VPS options for SSD-backed instances and flexible block storage that simplify disk management for busy sites and applications.