When you run serious projects on a VPS, guessing is the most expensive performance strategy. If a site feels slow, a queue falls behind, or a cron job runs forever, the root cause almost always shows up clearly in resource metrics: CPU, RAM, disk I/O and network. The challenge is not whether the system is telling you something, but whether you are collecting and reading those signals correctly. In our day-to-day work at dchost.com, we see the same pattern across e‑commerce stores, SaaS apps and content sites: teams that monitor their VPS properly spend less time firefighting and more time planning.
In this guide, we will walk through a practical monitoring toolkit for Linux VPS servers: htop for quick CPU and RAM inspection, iotop for disk I/O, Netdata for real-time visual dashboards, and Prometheus for long-term metrics and alerts. We will focus on how to interpret what you see, which thresholds matter, and how to turn raw numbers into concrete actions like optimizing code, tuning databases or resizing your VPS at dchost.com when it is truly needed.
İçindekiler
- 1 Why VPS Resource Monitoring Matters More Than Ever
- 2 Reading the Core VPS Metrics: CPU, RAM, IO and Network
- 3 htop: Fast, Interactive View of CPU and RAM
- 4 iotop: Finding the Processes That Abuse Disk I/O
- 5 Netdata: Real-Time Visual Monitoring on a Single VPS
- 6 Prometheus: Long-Term Metrics and Alerts for VPS Fleets
- 7 Designing a Practical Monitoring Workflow
- 8 Where dchost.com Fits Into Your Monitoring Strategy
Why VPS Resource Monitoring Matters More Than Ever
Every VPS has four fundamental resource pillars:
- CPU: how many instructions your applications can execute in parallel.
- RAM: how much working memory is available for processes, caches and buffers.
- Disk I/O: how quickly the system can read and write data to storage.
- Network: how fast and reliably data moves in and out of your server.
When one of these pillars is overloaded, everything built on top of it shakes. High CPU usage leads to slow PHP responses or Node.js event loop delays. Memory pressure triggers swapping, where the kernel starts pushing data to disk, making even simple tasks sluggish. Saturated disk I/O makes databases crawl, and a congested network turns quick API calls into multi-second waits.
We have covered symptoms such as websites that are slow only at certain hours due to CPU, I/O or MySQL bottlenecks. Behind all these issues, there is one common theme: insufficient visibility. Once you can see CPU, RAM, I/O and network usage over time, capacity planning becomes much easier and cheaper than trial-and-error upgrades.
Reading the Core VPS Metrics: CPU, RAM, IO and Network
Before diving into tools, align on what you want to observe. These are the metrics we always start with when onboarding a new VPS at dchost.com:
CPU Metrics
- Per-core usage (%): Helps detect a single-threaded bottleneck (one core at 100% while others are idle).
- Load average: Shows how many processes are actively running or waiting for CPU. Rough rule: sustained load equal to or higher than your vCPU count (e.g., 4.0 load on a 4 vCPU VPS) is a sign of saturation.
- CPU steal time: On virtualized environments, this is the time your VPS wanted CPU but the hypervisor could not immediately provide it. Persistently high steal is a red flag.
Memory Metrics
- Used vs free RAM: Straightforward, but remember Linux uses free RAM as cache; this is usually good, not bad.
- Cached/buffered memory: Data the kernel keeps in RAM to speed up disk operations.
- Swap usage: If swap is heavily used and you see swap-in/out activity, your processes want more RAM than the VPS has.
Disk I/O Metrics
- Read/write throughput (MB/s): How much data is moving to and from disk.
- IOPS: Number of read/write operations per second, key for databases and small random reads.
- IOwait: Portion of CPU time spent waiting for disk I/O; if this is high, your CPU looks idle but is actually blocked on the disk.
Network Metrics
- Bandwidth (in/out MB/s): How much traffic the server is sending and receiving.
- Packets per second: Useful for detecting DDoS or noisy traffic patterns.
- Connections and errors: Helps catch SYN floods, connection spikes, or misconfigurations.
If these metrics sound abstract, do not worry. Tools like htop, iotop, Netdata and Prometheus expose them in ways that are much easier to understand than raw /proc files. Let us start from the quickest, most hands-on ones and work up to a full monitoring stack.
htop: Fast, Interactive View of CPU and RAM
htop is usually the first tool we reach for when logging into a VPS to understand what is happening right now. Compared to the classic top command, htop offers colors, a scrollable process list, easy filtering and per-core graphs. It covers CPU, memory and swap in one compact view.
Installing htop
On most Linux distributions, installation is straightforward:
- Debian/Ubuntu: apt install htop
- AlmaLinux/Rocky Linux/CentOS: yum install htop or dnf install htop
Run it with:
htop
Reading the htop Interface
The top part of the screen shows system-wide metrics:
- CPU bars per core, often color-coded for user, system, nice, IOwait, etc.
- Memory and swap bars with total and used values.
- Load average (1, 5, 15 minutes) and uptime.
The main table lists processes with columns like PID, user, CPU%, MEM%, time, and command. You can sort by a column (for example CPU% or MEM%) by using the function keys or the mouse, depending on your terminal.
Using htop to Diagnose CPU Bottlenecks
Typical workflow when a VPS feels slow:
- Run htop and check the load average and CPU bars.
- If you see one or more cores pegged near 100%, sort processes by CPU%.
- Identify the culprit process (PHP-FPM worker, Node.js, a background job, etc.).
- Check whether it is a legitimate workload (e.g., an image export task) or a runaway script or attack.
- Use htop to renice or kill a misbehaving process if necessary, then implement a long-term fix.
For example, we frequently see WooCommerce and Laravel apps where a single heavy SQL query in a background job keeps one CPU core fully busy. In htop, you would see one core maxed out, a single process at 100% CPU and relatively normal RAM usage. The next step is to profile or optimize that code, not necessarily to buy more CPU immediately. Our article on MySQL/InnoDB tuning for WooCommerce dives deeper into this side.
Using htop to Understand Memory and Swap
Memory issues are more subtle. When RAM is tight, you might notice:
- The swap bar in htop steadily increasing.
- Processes with large RES (resident memory) footprints.
- Overall system sluggishness, even though CPU usage looks moderate.
If swap usage slowly rises throughout the day and resets only when you reboot or restart big services, your application stack is asking for more RAM than the VPS has. This can come from:
- Too many PHP-FPM or worker processes in parallel.
- Database buffer pools set too high.
- Background tasks caching a lot of data in memory.
In such cases, either tune services down or plan a RAM upgrade. Our guide on estimating CPU, RAM and bandwidth needs for new websites is a good complement when you decide whether optimization or scaling is the right next move.
Practical htop Thresholds
Every workload is different, but we often use these rough thresholds on VPS servers:
- Load average: if 5-minute load is consistently above vCPU count × 1.0, investigate CPU and I/O.
- IOwait: CPU bars colored heavily in IOwait or CPU usage low while the system still feels slow suggests disk bottlenecks.
- Swap: occasional small swap usage is fine; continuous growth with high swap-in/out is a problem.
When htop suggests a disk-related issue, we move to iotop.
iotop: Finding the Processes That Abuse Disk I/O
While htop can hint that the system is blocked on disk, iotop tells you exactly which processes are doing the heaviest I/O. This is crucial for diagnosing slow databases, overloaded log directories, or misconfigured backups.
Installing iotop
- Debian/Ubuntu: apt install iotop
- AlmaLinux/Rocky Linux/CentOS: yum install iotop or dnf install iotop
Then run:
sudo iotop
On some kernels, you may need specific options or capabilities, but on most modern VPS images this works out-of-the-box.
Key iotop Columns
When you start iotop, focus on these columns:
- DISK READ/DISK WRITE: current I/O throughput per process.
- IO%: how much time the process spends waiting on I/O.
- SWAPIN%: time spent reading from swap; high here plus disk activity often means RAM pressure.
- COMMAND: process name and arguments, so you can map I/O usage to services.
A very useful variant is:
sudo iotop -oPa
which shows only processes actually doing I/O, in an accumulated way, and in batch mode (handy for logging via cron).
Typical Disk Bottleneck Scenarios
On real VPS workloads, iotop often reveals one of these patterns:
- Database writes dominating disk: mysqld or postgres consuming a lot of IO%, especially during heavy imports or poorly indexed queries.
- Log files exploding: web server or application logs doing constant writes, often combined with no rotation.
- Backup jobs saturating disk: tar, rsync or backup agents reading/writing large volumes during peak hours.
When logs are the problem, combining iotop with proper rotation is powerful. Our article on VPS disk usage and logrotate to prevent “No space left on device” errors shows how to keep both disk usage and I/O under control through smart log policies.
If backups are saturating I/O, moving them to off-peak hours or using incremental strategies can radically improve daytime performance without any hardware changes.
Netdata: Real-Time Visual Monitoring on a Single VPS
Netdata is a lightweight monitoring agent with a built-in web dashboard. It provides beautiful, high-resolution charts for CPU, memory, disk, network and many services with near real-time granularity (seconds, not minutes). For a single VPS or a small handful of servers, Netdata can give you 90% of the visibility you need with minimal setup.
Installing Netdata
Netdata provides a one-line installation script that auto-detects your distribution. In production, we prefer to review their documentation and run the recommended command for our OS, but the process is typically:
bash <(curl -Ss https://my-netdata.io/kickstart.sh)
After installation, Netdata usually listens on a port like 19999. Make sure to restrict access using a firewall (for example UFW, nftables or security groups) or a reverse proxy with authentication. Exposing raw monitoring dashboards directly to the internet is rarely a good idea.
What You See in Netdata
Open the Netdata dashboard in your browser and you will find:
- System overview: CPU, load, RAM, swap at a glance.
- Disk charts: per-disk read/write throughput, IOPS and latency information.
- Network charts: inbound/outbound bandwidth, packets, errors.
- Per-application charts: web servers, databases, caching layers, depending on what is installed.
The power of Netdata is in correlation. Imagine you get reports that your store slows down around 21:00 every evening. In Netdata, you can zoom into that time window and see if CPU spikes, disk I/O climbs, or network traffic jumps. This visual correlation often provides an immediate “aha” moment.
This complements the kind of step-by-step diagnosis we cover in our guide for websites that are slow only at certain hours due to CPU, IO and MySQL issues.
Netdata Alarms
Netdata ships with a set of default alarms. You can customize thresholds and notifications—for example:
- Trigger a warning when CPU > 80% for 5 minutes.
- Trigger a critical alert when swap usage exceeds 20%.
- Alert if disk utilization crosses a certain level for a sustained period.
For teams that want a simple “something is wrong” signal without building a full observability stack, Netdata is often enough. When you start adding more VPSs or need long-term history, Prometheus starts to shine.
Prometheus: Long-Term Metrics and Alerts for VPS Fleets
For more advanced monitoring, especially when you run multiple VPSs, application servers and databases, we recommend building a metrics stack around Prometheus. Prometheus is a time-series database and metrics scraper: it periodically fetches metrics from targets (called exporters) over HTTP and stores them with labels for flexible querying.
Key Components
- Prometheus server: scrapes metrics from exporters, stores them and evaluates alert rules.
- Node Exporter: a small agent on each VPS that exposes CPU, RAM, disk and network metrics.
- Grafana: an optional but very popular dashboarding layer on top of Prometheus.
We use this trio heavily in our own infrastructure. If you want a gentle introduction to the overall stack, our post on VPS monitoring and alerts with Prometheus, Grafana and Uptime Kuma walks through a full beginner-friendly setup. For a deeper, VPS-focused perspective, see the playbook we use with Prometheus, Grafana and Node Exporter.
Installing Node Exporter on a VPS
The minimal Prometheus setup for VPS resource monitoring looks like this:
- Install a Prometheus server (on a dedicated monitoring VPS, for example).
- Install Node Exporter on each VPS.
- Configure Prometheus to scrape each Node Exporter endpoint.
On the monitored VPS, you would typically:
- Download the Node Exporter binary from the official release page.
- Create a systemd service to run it on boot.
- Optionally put it behind a firewall or reverse proxy if needed.
Once running, Node Exporter exposes metrics like node_cpu_seconds_total, node_memory_MemAvailable_bytes, node_disk_io_time_seconds_total and node_network_receive_bytes_total.
Useful Prometheus Metrics for VPS Resources
Prometheus lets you query and combine metrics using PromQL. Some common queries we use (simplified to illustrate ideas):
- CPU usage per VPS (percentage over 5 minutes):
100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) - Free memory in MB:
(node_memory_MemAvailable_bytes) / 1024 / 1024
- Disk I/O time percentage:
avg by (instance) (rate(node_disk_io_time_seconds_total[5m])) * 100
- Network throughput (MB/s):
sum by (instance) (rate(node_network_transmit_bytes_total[5m])) / 1024 / 1024
With a few Grafana dashboards, you get a clear picture of which VPSs are close to CPU or RAM limits, which disks show high I/O time, and which servers push the most bandwidth. Over weeks and months, this is invaluable for capacity planning.
Prometheus Alerts
One of Prometheus’s strengths is alerting. You define alert rules like:
- CPU usage > 85% on average for 10 minutes.
- Available memory < 500 MB for 5 minutes.
- Disk I/O time > 50% for 10 minutes.
- Network errors > 0 for more than 2 minutes.
These alerts are sent via Alertmanager to channels like email, Slack, or other tools. The key is to pick thresholds that reflect business impact. A short CPU burst may be fine, while 20 minutes of constant high IOwait during checkout hours is unacceptable for an online store.
For production workloads, pairing Prometheus metrics with a good backup and disaster recovery strategy is equally important. Our article on designing a backup strategy with clear RPO/RTO goals explains how to connect monitoring with realistic recovery objectives.
Designing a Practical Monitoring Workflow
It is easy to get lost in tools, dashboards and alerts. The real value comes from a simple, repeatable workflow that your team actually uses. Here is how we usually structure monitoring for customers on dchost.com VPS, dedicated and colocation servers.
1. Establish a Baseline
Right after deploying a new VPS, we like to:
- Run baseline benchmarks and tests (CPU, disk and network) to understand expected performance. Our new VPS checklist with CPU, disk and network benchmarks shows this process step-by-step.
- Deploy Node Exporter and hook it into a Prometheus/Grafana stack.
- Install htop and iotop for on-the-spot debugging.
This gives you a baseline: how much CPU and RAM the app uses on a quiet day, typical disk I/O levels, and normal network traffic patterns.
2. Use htop and iotop for Live Firefighting
When users report slowness or you see an alert, SSH into the VPS and start with:
- htop to check CPU, RAM, load average and swap.
- iotop to confirm whether disk I/O is the bottleneck.
Often, this is enough to identify whether the problem is:
- A spike in application processes (e.g., too many PHP-FPM workers).
- A single heavy background job.
- A database workload or backup saturating disk.
3. Use Netdata for Visual Correlation
Once immediate fire is under control, open Netdata to see the last hours visually. Correlate CPU, disk and network charts around the incident time. Look for repeated patterns: does the same spike happen every day at the same time? That often points to scheduled jobs, search index rebuilds or third-party sync tasks.
4. Use Prometheus for History and Capacity Planning
Over weeks and months, Prometheus data becomes your best tool for deciding whether to:
- Optimize configuration/code: e.g., if CPU spikes map clearly to a single API endpoint.
- Re-architect: e.g., move database to its own VPS when CPU or I/O on a single node becomes the limiting factor.
- Resize the VPS: when metrics show that even an optimized stack runs close to limits.
We always recommend starting with measurement and optimization before scaling blindly. Our guide on cutting hosting costs by right-sizing VPS, bandwidth and storage explains how to align metrics with budgets.
5. Keep the Setup Lightweight but Reliable
A common concern is monitoring overhead on small VPSs. htop and iotop are on-demand tools and have negligible impact. Netdata and Node Exporter are efficient, but you should still:
- Disable collectors you do not need in Netdata.
- Adjust scrape intervals in Prometheus (e.g., every 15s or 30s instead of every 1s).
- Rotate and compress logs to avoid filling disks.
With sensible settings, monitoring should consume only a small fraction of CPU, RAM and disk while giving you a substantial reliability boost.
Where dchost.com Fits Into Your Monitoring Strategy
Good monitoring multiplies the value of good infrastructure. At dchost.com, we provide the VPS, dedicated server and colocation platforms where this tooling can run reliably. From our side, we focus on stable hardware, fast storage (including NVMe options), robust network connectivity and data center resilience. From your side, tools like htop, iotop, Netdata and Prometheus help you squeeze the most value out of those resources.
When you already monitor CPU, RAM, IO and network properly, moving between plans becomes a strategic decision instead of a guess. Metrics make it clear when it is time to upgrade a VPS, add a separate database node or move a growing workload to a dedicated server or colocation. Our team is happy to review your existing graphs and logs, then recommend whether tuning, scaling up or scaling out is the best next step.
Combine the short-term clarity of htop and iotop, the real-time insight of Netdata and the long-term power of Prometheus, and you get a monitoring setup that truly supports your business. If you want an environment designed with these practices in mind, explore our VPS and server options at dchost.com, and build a calm, observable hosting stack instead of flying blind.
