How to Monitor Your VPS: CPU, RAM, Disk, and Uptime Alerting with Netdata and UptimeRobot
A VPS without monitoring is infrastructure you are flying blind. You will not know when CPU usage is consistently at 90%, when disk space is running out, or when your site goes down — until a user or client tells you. Monitoring catches these problems early, often before they affect users. This guide sets up a complete monitoring stack using two free tools: Netdata for real-time server metrics and UptimeRobot for external uptime monitoring with instant alerting.
What You Need to Monitor
Effective VPS monitoring covers four categories:
- Uptime: Is the server reachable? Is the web application responding correctly?
- Resource utilization: CPU, RAM, swap, and disk usage trends over time
- I/O performance: Disk read/write rates, I/O wait, network throughput
- Application health: Is Nginx running? Is MySQL responding? Are error rates elevated?
The two tools in this guide cover all four categories: UptimeRobot handles external uptime checks, Netdata handles everything on the server itself.
Part 1: UptimeRobot — External Uptime Monitoring
Why External Monitoring Matters
Server-side monitoring tools cannot tell you if your server is unreachable from the outside world — they are running on the very server that might be down. External monitoring from a third-party network provides the definitive answer: is your site accessible to users right now?
Setting Up UptimeRobot (Free Tier)
UptimeRobot’s free tier provides:
- 50 monitors
- 5-minute check intervals
- Email, Slack, Discord, webhook, and SMS alerts
- Public status pages
- Create a free account at uptimerobot.com
- Click “Add New Monitor”
- Select monitor type: HTTP(S) for websites, Port for specific services, Ping for basic connectivity
- Enter the URL or IP you want to monitor
- Configure alert contacts (email is configured automatically; add Slack or other integrations in Settings → Alert Contacts)
Recommended UptimeRobot Monitors
Add these monitors for a typical VPS deployment:
- HTTPS monitor for your main domain:
https://yourdomain.com— checks that Nginx and your application are responding with HTTP 200 - Ping monitor for your VPS IP — checks that the server is reachable at the network level (distinguishes server outages from application crashes)
- Port monitor for SSH (port 22 or your custom port) — confirms the SSH daemon is running
- Port monitor for SMTP (port 25) — if running a mail server, confirms Postfix is accepting connections
Keyword Monitoring for Application Health
UptimeRobot’s “Keyword” monitor type checks not just whether a page returns HTTP 200, but whether it contains a specific text string. This detects cases where the server is running but the application has an error state:
- Add a health endpoint to your application (e.g.,
/healththat returns “OK”) - Create an UptimeRobot Keyword monitor for that URL, checking for the keyword “OK”
- If your application crashes and the page returns an error, the keyword won’t be found and you’ll receive an alert
Part 2: Netdata — Real-Time Server Monitoring
Installing Netdata
Netdata provides hundreds of pre-configured metrics collectors that auto-detect running services. A single command installs it:
wget -O /tmp/netdata-install.sh https://my-netdata.io/kickstart.sh
bash /tmp/netdata-install.sh --stable-channel --disable-telemetry
Netdata starts automatically and listens on port 19999. Access the dashboard temporarily by allowing port 19999 through the firewall:
sudo ufw allow 19999/tcp
Visit http://YOUR_VPS_IP:19999 to see the real-time dashboard. Once you have verified it works, close port 19999 and access Netdata via SSH tunnel instead:
# Close port 19999 to the internet
sudo ufw delete allow 19999/tcp
# Access Netdata securely via SSH tunnel from your local machine
ssh -L 19999:localhost:19999 user@YOUR_VPS_IP
Then visit http://localhost:19999 in your local browser.
What Netdata Monitors Automatically
Netdata auto-detects and collects metrics for:
- CPU (per-core usage, interrupts, context switches)
- Memory (RAM, swap, page faults)
- Disk I/O (per-disk throughput, IOPS, utilization, latency)
- Network (per-interface throughput, packets, errors)
- Nginx (requests/second, active connections, response codes)
- MySQL/MariaDB (queries/second, slow queries, connections, InnoDB metrics)
- PHP-FPM (active workers, requests per second, queue length)
- Redis (operations/second, memory usage, hit rate)
- Docker containers (per-container CPU, RAM, network, I/O)
- System processes (CPU and memory per process)
Configuring Netdata Alerts
Netdata ships with hundreds of pre-configured alert rules. View active alerts:
sudo nano /etc/netdata/health.d/
Customize alert thresholds by creating override files. For example, to alert when disk usage exceeds 80%:
sudo nano /etc/netdata/health.d/disk-custom.conf
alarm: disk_usage_warning
on: disk.space
lookup: average -10m unaligned of used
units: %
every: 1m
warn: $this > 80
crit: $this > 90
info: disk space utilization
to: sysadmin
Configuring Email Alerts from Netdata
sudo nano /etc/netdata/health_alarm_notify.conf
Find and configure the email section:
SEND_EMAIL="YES"
DEFAULT_RECIPIENT_EMAIL="admin@yourcompany.com"
EMAIL_SENDER="netdata@YOUR_VPS_IP"
Install a mail transfer agent if not already present:
sudo apt install msmtp msmtp-mta -y
Configure msmtp to relay through your email provider (Gmail, SendGrid, Postmark, or your own mail server).
Configuring Slack Alerts from Netdata
Create an incoming webhook in your Slack workspace, then configure Netdata:
sudo nano /etc/netdata/health_alarm_notify.conf
SEND_SLACK="YES"
SLACK_WEBHOOK_URL="https://hooks.slack.com/services/YOUR/WEBHOOK/URL"
DEFAULT_RECIPIENT_SLACK="#alerts"
Test the Slack integration:
sudo -u netdata /usr/libexec/netdata/plugins.d/alarm-notify.sh test slack
Key Metrics to Watch and Their Alert Thresholds
| Metric | Warning | Critical | Action |
|---|---|---|---|
| CPU utilization | >70% for 10 min | >90% for 5 min | Identify process, optimize or scale |
| RAM utilization | >80% | >95% | Check for memory leaks, add RAM |
| Swap usage | >20% | >50% | Immediate: add RAM or reduce memory usage |
| Disk usage | >80% | >90% | Clean logs/cache, expand storage |
| I/O wait | >10% | >20% | Optimize queries, add caching, check storage |
| Disk I/O utilization | >70% | >90% | Optimize or move to faster storage |
| Network error rate | >0.1% | >1% | Check network configuration and hardware |
Reading Netdata Charts Effectively
Identifying Traffic Spikes
Correlate the Nginx “requests/second” chart with CPU and RAM usage. A traffic spike that causes CPU to hit 90% but RAM remains stable indicates a CPU-bound workload — add caching or scale vertically. A traffic spike that consumes all available RAM indicates insufficient object caching or PHP-FPM pool over-allocation.
Database Performance Analysis
The MySQL/MariaDB section shows slow queries per second. If slow queries increase during traffic spikes, check the slow query log to identify which queries need optimization or indexing:
# Enable slow query log in MariaDB
sudo nano /etc/mysql/mariadb.conf.d/50-server.cnf
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 2
sudo systemctl restart mariadb
# Analyze slow queries
sudo mysqldumpslow -t 10 /var/log/mysql/slow.log
Setting Up a Public Status Page
UptimeRobot’s free status page feature lets you create a public-facing page showing your service uptime history. Share the URL with clients so they can self-check service status during incidents rather than contacting support:
- In UptimeRobot dashboard, go to “Status Pages”
- Create new status page, add your monitors
- Optionally configure a custom domain (e.g.,
status.yourdomain.com)
Getting Started
Both Netdata and UptimeRobot work immediately after installation on any USA VPS or Hong Kong VPS. Netdata requires approximately 100–200 MB RAM for its collector processes — factor this into your VPS sizing. The UptimeRobot free tier is sufficient for most single-server deployments; the paid tier reduces check intervals to 1 minute if faster detection is required.
Conclusion
Complete VPS monitoring is a two-layer problem: external uptime checks that confirm your service is reachable from the outside world, and internal server metrics that show what is happening inside the box. UptimeRobot solves the external layer in five minutes with zero ongoing cost. Netdata solves the internal layer with automatic detection of hundreds of metrics and configurable alerts. Together, they give you the visibility to diagnose problems proactively rather than reactively — the foundation of reliable VPS operations.