{"id":4803,"date":"2026-02-08T19:18:42","date_gmt":"2026-02-08T16:18:42","guid":{"rendered":"https:\/\/www.dchost.com\/blog\/centralized-server-monitoring-and-alerting-with-prometheus-grafana-and-zabbix\/"},"modified":"2026-02-08T19:18:42","modified_gmt":"2026-02-08T16:18:42","slug":"centralized-server-monitoring-and-alerting-with-prometheus-grafana-and-zabbix","status":"publish","type":"post","link":"https:\/\/www.dchost.com\/blog\/en\/centralized-server-monitoring-and-alerting-with-prometheus-grafana-and-zabbix\/","title":{"rendered":"Centralized Server Monitoring and Alerting with Prometheus, Grafana and Zabbix"},"content":{"rendered":"<div class=\"dchost-blog-content-wrapper\"><p>When you manage more than a handful of servers, &#8220;logging in and checking top&#8221; stops being a monitoring strategy. You need a single, reliable place where CPU, RAM, disk, network, database, HTTP checks and hardware metrics come together; where alerts are consistent; and where teams see the same truth. In real hosting environments, that usually means combining pull-based metrics (Prometheus), rich dashboards (Grafana) and agent\/SNMP\u2011driven monitoring (Zabbix) into one centralized architecture. In this article, we will walk through how we at dchost.com design such a stack for <a href=\"https:\/\/www.dchost.com\/vps\">VPS<\/a>, <a href=\"https:\/\/www.dchost.com\/dedicated-server\">dedicated server<\/a> and colocation infrastructures.<\/p>\n<p>We will focus on practical architecture: how Prometheus scrapes exporters, how Zabbix agents and proxies fit in, how Grafana sits on top as a shared observability layer, and how to avoid common traps like noisy alerts or under\u2011sizing your monitoring server. Whether you run a few production VPS servers or a mixed fleet of physical nodes, switches and firewalls in a rack, this guide will help you build a centralized monitoring and alerting platform that scales cleanly.<\/p>\n<div id=\"toc_container\" class=\"toc_transparent no_bullets\"><p class=\"toc_title\">\u0130&ccedil;indekiler<\/p><ul class=\"toc_list\"><li><a href=\"#Why_Centralized_Server_Monitoring_Matters\"><span class=\"toc_number toc_depth_1\">1<\/span> Why Centralized Server Monitoring Matters<\/a><ul><li><a href=\"#One_place_for_all_signals\"><span class=\"toc_number toc_depth_2\">1.1<\/span> One place for all signals<\/a><\/li><li><a href=\"#Faster_incident_response_and_fewer_blind_spots\"><span class=\"toc_number toc_depth_2\">1.2<\/span> Faster incident response and fewer blind spots<\/a><\/li><li><a href=\"#Capacity_planning_and_cost_control\"><span class=\"toc_number toc_depth_2\">1.3<\/span> Capacity planning and cost control<\/a><\/li><\/ul><\/li><li><a href=\"#The_Roles_of_Prometheus_Grafana_and_Zabbix\"><span class=\"toc_number toc_depth_1\">2<\/span> The Roles of Prometheus, Grafana and Zabbix<\/a><ul><li><a href=\"#Prometheus_timeseries_metrics_and_alerting\"><span class=\"toc_number toc_depth_2\">2.1<\/span> Prometheus: time\u2011series metrics and alerting<\/a><\/li><li><a href=\"#Grafana_dashboards_and_crosssource_visualization\"><span class=\"toc_number toc_depth_2\">2.2<\/span> Grafana: dashboards and cross\u2011source visualization<\/a><\/li><li><a href=\"#Zabbix_agentSNMP_monitoring_and_autodiscovery\"><span class=\"toc_number toc_depth_2\">2.3<\/span> Zabbix: agent\/SNMP monitoring and auto\u2011discovery<\/a><\/li><li><a href=\"#Why_combine_them_instead_of_choosing_one\"><span class=\"toc_number toc_depth_2\">2.4<\/span> Why combine them instead of choosing one?<\/a><\/li><\/ul><\/li><li><a href=\"#Reference_Architecture_for_Centralized_Monitoring\"><span class=\"toc_number toc_depth_1\">3<\/span> Reference Architecture for Centralized Monitoring<\/a><ul><li><a href=\"#Highlevel_overview\"><span class=\"toc_number toc_depth_2\">3.1<\/span> High\u2011level overview<\/a><\/li><li><a href=\"#Network_layout_and_connectivity\"><span class=\"toc_number toc_depth_2\">3.2<\/span> Network layout and connectivity<\/a><\/li><li><a href=\"#Component_sizing\"><span class=\"toc_number toc_depth_2\">3.3<\/span> Component sizing<\/a><\/li><\/ul><\/li><li><a href=\"#Onboarding_Servers_and_Services\"><span class=\"toc_number toc_depth_1\">4<\/span> Onboarding Servers and Services<\/a><ul><li><a href=\"#Installing_exporters_for_Prometheus\"><span class=\"toc_number toc_depth_2\">4.1<\/span> Installing exporters for Prometheus<\/a><\/li><li><a href=\"#Deploying_Zabbix_agents_and_proxies\"><span class=\"toc_number toc_depth_2\">4.2<\/span> Deploying Zabbix agents and proxies<\/a><\/li><li><a href=\"#Monitoring_network_devices_and_hardware\"><span class=\"toc_number toc_depth_2\">4.3<\/span> Monitoring network devices and hardware<\/a><\/li><li><a href=\"#Combining_uptime_checks_with_deeper_metrics\"><span class=\"toc_number toc_depth_2\">4.4<\/span> Combining uptime checks with deeper metrics<\/a><\/li><\/ul><\/li><li><a href=\"#Designing_Useful_Dashboards_and_Alerts\"><span class=\"toc_number toc_depth_1\">5<\/span> Designing Useful Dashboards and Alerts<\/a><ul><li><a href=\"#Grafana_as_the_shared_observability_layer\"><span class=\"toc_number toc_depth_2\">5.1<\/span> Grafana as the shared observability layer<\/a><\/li><li><a href=\"#Where_to_put_alert_logic_Prometheus_Zabbix_or_Grafana\"><span class=\"toc_number toc_depth_2\">5.2<\/span> Where to put alert logic: Prometheus, Zabbix or Grafana?<\/a><\/li><li><a href=\"#Avoiding_alert_fatigue\"><span class=\"toc_number toc_depth_2\">5.3<\/span> Avoiding alert fatigue<\/a><\/li><\/ul><\/li><li><a href=\"#Integrating_Logs_Metrics_and_Uptime\"><span class=\"toc_number toc_depth_1\">6<\/span> Integrating Logs, Metrics and Uptime<\/a><ul><li><a href=\"#Why_logs_still_matter\"><span class=\"toc_number toc_depth_2\">6.1<\/span> Why logs still matter<\/a><\/li><li><a href=\"#Endtoend_flow_during_an_incident\"><span class=\"toc_number toc_depth_2\">6.2<\/span> End\u2011to\u2011end flow during an incident<\/a><\/li><\/ul><\/li><li><a href=\"#Practical_Implementation_Steps_on_VPS_or_Dedicated_Servers\"><span class=\"toc_number toc_depth_1\">7<\/span> Practical Implementation Steps on VPS or Dedicated Servers<\/a><ul><li><a href=\"#1_Choose_and_prepare_the_monitoring_host\"><span class=\"toc_number toc_depth_2\">7.1<\/span> 1. Choose and prepare the monitoring host<\/a><\/li><li><a href=\"#2_Install_Prometheus_Alertmanager_and_Grafana\"><span class=\"toc_number toc_depth_2\">7.2<\/span> 2. Install Prometheus, Alertmanager and Grafana<\/a><\/li><li><a href=\"#3_Install_Zabbix_server_and_connect_it_to_Grafana\"><span class=\"toc_number toc_depth_2\">7.3<\/span> 3. Install Zabbix server and connect it to Grafana<\/a><\/li><li><a href=\"#4_Roll_out_exporters_and_agents_across_your_fleet\"><span class=\"toc_number toc_depth_2\">7.4<\/span> 4. Roll out exporters and agents across your fleet<\/a><\/li><li><a href=\"#5_Build_and_iterate_dashboards_and_alert_rules\"><span class=\"toc_number toc_depth_2\">7.5<\/span> 5. Build and iterate dashboards and alert rules<\/a><\/li><\/ul><\/li><li><a href=\"#Security_MultiTenancy_and_Access_Control\"><span class=\"toc_number toc_depth_1\">8<\/span> Security, Multi\u2011Tenancy and Access Control<\/a><ul><li><a href=\"#Securing_data_paths\"><span class=\"toc_number toc_depth_2\">8.1<\/span> Securing data paths<\/a><\/li><li><a href=\"#Agency_and_multitenant_scenarios\"><span class=\"toc_number toc_depth_2\">8.2<\/span> Agency and multi\u2011tenant scenarios<\/a><\/li><\/ul><\/li><li><a href=\"#How_We_Apply_This_Stack_at_dchostcom\"><span class=\"toc_number toc_depth_1\">9<\/span> How We Apply This Stack at dchost.com<\/a><ul><li><a href=\"#Typical_realworld_scenario\"><span class=\"toc_number toc_depth_2\">9.1<\/span> Typical real\u2011world scenario<\/a><\/li><li><a href=\"#Why_host_monitoring_on_separate_infrastructure\"><span class=\"toc_number toc_depth_2\">9.2<\/span> Why host monitoring on separate infrastructure?<\/a><\/li><\/ul><\/li><li><a href=\"#Conclusion_Building_a_Monitoring_Foundation_You_Can_Trust\"><span class=\"toc_number toc_depth_1\">10<\/span> Conclusion: Building a Monitoring Foundation You Can Trust<\/a><\/li><\/ul><\/div>\n<h2><span id=\"Why_Centralized_Server_Monitoring_Matters\">Why Centralized Server Monitoring Matters<\/span><\/h2>\n<h3><span id=\"One_place_for_all_signals\">One place for all signals<\/span><\/h3>\n<p>In most environments we see, monitoring starts with a mix of ad\u2011hoc tools: a simple uptime checker here, a panel resource graph there, maybe a local script sending emails on high load. It works until it doesn\u2019t. The moment you have multiple VPS, dedicated servers or on\u2011prem machines, fragmented tools become a problem:<\/p>\n<ul>\n<li>You cannot see <strong>correlations<\/strong> (e.g. load on one database node vs. queue length on an app node).<\/li>\n<li>You lose time switching between dashboards when diagnosing issues.<\/li>\n<li>Each team creates their own monitoring &#8220;island&#8221; with different thresholds and alert styles.<\/li>\n<\/ul>\n<p>A centralized architecture fixes this by pulling metrics from every server and device into a single platform, applying <strong>consistent alert rules<\/strong> and offering shared dashboards for operations, developers and management.<\/p>\n<h3><span id=\"Faster_incident_response_and_fewer_blind_spots\">Faster incident response and fewer blind spots<\/span><\/h3>\n<p>With centralized monitoring, you can answer questions quickly:<\/p>\n<ul>\n<li>&#8220;Is slow checkout caused by the web layer, the database, the cache server, or the payment API latency?&#8221;<\/li>\n<li>&#8220;Is this spike in 5xx errors a one\u2011off or part of a trend over the last 30 days?&#8221;<\/li>\n<li>&#8220;Which servers are close to running out of disk or inodes in the next week?&#8221;<\/li>\n<\/ul>\n<p>By combining <strong>time\u2011series metrics<\/strong> (Prometheus), <strong>agent\/SNMP checks<\/strong> (Zabbix) and <strong>visual analysis<\/strong> (Grafana), you no longer guess; you see. For example, you can correlate MySQL query latency, PHP\u2011FPM pool saturation and HTTP response codes on the same Grafana panel.<\/p>\n<h3><span id=\"Capacity_planning_and_cost_control\">Capacity planning and cost control<\/span><\/h3>\n<p>Monitoring is not only about catching errors. When you observe resource usage over weeks and months, you can right\u2011size VPS and dedicated servers instead of over\u2011provisioning everything &#8220;just in case&#8221;. We routinely use centralized metrics to decide:<\/p>\n<ul>\n<li>When to move a busy WooCommerce store from shared hosting to a VPS or from a single VPS to a small cluster.<\/li>\n<li>Whether extra RAM or faster NVMe storage will yield more benefit for a specific workload.<\/li>\n<li>Which nodes are consistently under\u2011used and can be consolidated to save budget.<\/li>\n<\/ul>\n<p>If you want a deeper dive into capacity planning, we cover sizing decisions in our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/woocommerce-kapasite-planlama-rehberi-vcpu-ram-iops-nasil-hesaplanir\/\">WooCommerce capacity planning for vCPU, RAM and IOPS<\/a>, and similar principles apply to generic application servers.<\/p>\n<h2><span id=\"The_Roles_of_Prometheus_Grafana_and_Zabbix\">The Roles of Prometheus, Grafana and Zabbix<\/span><\/h2>\n<h3><span id=\"Prometheus_timeseries_metrics_and_alerting\">Prometheus: time\u2011series metrics and alerting<\/span><\/h3>\n<p><strong>Prometheus<\/strong> is optimized for collecting and querying numerical time\u2011series data (metrics). It is pull\u2011based: Prometheus servers regularly &#8220;scrape&#8221; HTTP endpoints (exporters) that expose metrics in a specific text format. Key benefits:<\/p>\n<ul>\n<li><strong>High\u2011resolution metrics<\/strong> (e.g. every 15\u201330 seconds) with efficient on\u2011disk storage.<\/li>\n<li>Powerful query language (PromQL) for aggregations, rates, histograms and more.<\/li>\n<li>Easy integration with modern software via exporters (Node Exporter, Blackbox, MySQL, Nginx, etc.).<\/li>\n<li>Built\u2011in integration with Alertmanager for rule\u2011based alerts.<\/li>\n<\/ul>\n<p>For VPS environments, we often deploy Node Exporter on each Linux server to collect CPU, memory, disk, filesystem, network and basic system metrics, then scrape them from a central Prometheus instance. We\u2019ve published a detailed step\u2011by\u2011step playbook for this in our article on <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-izleme-ve-uyari-nasil-kurulur-prometheus-grafana-ve-node-exporter-ile-sessiz-alarmlari-konusturmak\/\">building a calm VPS monitoring stack with Prometheus, Grafana and Node Exporter<\/a>.<\/p>\n<h3><span id=\"Grafana_dashboards_and_crosssource_visualization\">Grafana: dashboards and cross\u2011source visualization<\/span><\/h3>\n<p><strong>Grafana<\/strong> is the visualization and dashboard layer of the stack. It doesn\u2019t store data itself; instead, it connects to multiple data sources:<\/p>\n<ul>\n<li>Prometheus for time\u2011series metrics.<\/li>\n<li>Zabbix via the official Grafana Zabbix data source plugin.<\/li>\n<li>Other systems like Loki (logs), MySQL, Elasticsearch and more.<\/li>\n<\/ul>\n<p>With Grafana you can build shared dashboards that mix, for example, Prometheus metrics for application performance, Zabbix metrics for hardware health and network devices, and logs visualized via Loki. This &#8220;single glass&#8221; makes on\u2011call work and capacity reviews far easier.<\/p>\n<h3><span id=\"Zabbix_agentSNMP_monitoring_and_autodiscovery\">Zabbix: agent\/SNMP monitoring and auto\u2011discovery<\/span><\/h3>\n<p><strong>Zabbix<\/strong> covers use cases that Prometheus alone doesn\u2019t handle as elegantly, particularly in mixed environments with a lot of legacy or network equipment:<\/p>\n<ul>\n<li><strong>Agent\u2011based monitoring<\/strong> for Windows and Linux servers, including OS\u2011level checks, services and log patterns.<\/li>\n<li><strong>SNMP monitoring<\/strong> for switches, routers, firewalls and UPS\/PDU devices.<\/li>\n<li><strong>Auto\u2011discovery<\/strong> and low\u2011level discovery (LLD) to find interfaces, disks, sensors and create items\/triggers automatically.<\/li>\n<li>Enterprise\u2011grade features like proxies for distributed setups, escalation steps, maintenance windows and built\u2011in alerting.<\/li>\n<\/ul>\n<p>In many dchost.com projects, Zabbix is our &#8220;inventory\u2011aware&#8221; system: it knows all hosts, groups, templates and dependencies, while Prometheus focuses on high\u2011resolution metrics from exporters.<\/p>\n<h3><span id=\"Why_combine_them_instead_of_choosing_one\">Why combine them instead of choosing one?<\/span><\/h3>\n<p>Prometheus and Zabbix overlap in some areas but shine in different ones. A combined architecture lets you:<\/p>\n<ul>\n<li>Use <strong>Prometheus<\/strong> where exporters and time\u2011series analytics matter (applications, databases, HTTP checks).<\/li>\n<li>Use <strong>Zabbix<\/strong> for inventory, SNMP network gear, Windows agents, and classic IT monitoring workflows.<\/li>\n<li>Use <strong>Grafana<\/strong> on top of both as the central visualization and (optionally) alerting console.<\/li>\n<\/ul>\n<p>From an operational standpoint, teams see one familiar interface (Grafana) while you retain the strengths of each backend.<\/p>\n<h2><span id=\"Reference_Architecture_for_Centralized_Monitoring\">Reference Architecture for Centralized Monitoring<\/span><\/h2>\n<h3><span id=\"Highlevel_overview\">High\u2011level overview<\/span><\/h3>\n<p>A typical centralized monitoring and alerting architecture we deploy for customers looks like this:<\/p>\n<ul>\n<li><strong>Monitoring core<\/strong> (usually on a dedicated VPS or server):\n<ul>\n<li>Prometheus server (+ optional Alertmanager).<\/li>\n<li>Zabbix server (with its database, usually MariaDB\/PostgreSQL).<\/li>\n<li>Grafana instance, connected to both as data sources.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Monitored infrastructure<\/strong>:\n<ul>\n<li>Linux and Windows servers (VPS, bare metal, on\u2011prem) with exporters and\/or Zabbix agents.<\/li>\n<li>Network devices (switches, routers, firewalls, load balancers) via SNMP and ICMP.<\/li>\n<li>Applications and databases via specialized exporters and Zabbix templates.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Notification channels<\/strong>:\n<ul>\n<li>Alertmanager routing to email, chat, webhooks.<\/li>\n<li>Zabbix media types for email, chat, SMS or ticketing.<\/li>\n<\/ul>\n<\/li>\n<\/ul>\n<p>All monitored nodes send or expose metrics to the core; no local dashboards are needed on each server. We usually place the monitoring core in a <strong>separate project or VLAN<\/strong> so that a problem on production servers does not immediately take down the monitoring system.<\/p>\n<h3><span id=\"Network_layout_and_connectivity\">Network layout and connectivity<\/span><\/h3>\n<p>For security and reliability, we recommend:<\/p>\n<ul>\n<li><strong>One dedicated monitoring VPS or dedicated server<\/strong> per environment (e.g. production vs staging), or a single powerful node with strict RBAC for multi\u2011tenant setups.<\/li>\n<li>Restricting access to <strong>Prometheus scrape ports and Zabbix agents<\/strong> via firewalls or VPN, not open to the whole internet.<\/li>\n<li>Using <strong>private IPs<\/strong> between monitoring core and monitored nodes whenever possible.<\/li>\n<li>Terminating all web UI access (Grafana, Zabbix front\u2011end) over HTTPS with strong TLS settings; our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/ssl-tls-protokol-guncellemeleri-surum-kapatma-tls-1-3-ve-modern-sifreler\/\">modern TLS protocol updates<\/a> covers the recommended ciphers and versions.<\/li>\n<\/ul>\n<p>On dchost.com infrastructure, we often place the monitoring VPS in the same region as the servers it monitors to minimize latency, but isolated enough that a misconfiguration or resource spike in production does not instantly affect monitoring.<\/p>\n<h3><span id=\"Component_sizing\">Component sizing<\/span><\/h3>\n<p>Sizing depends heavily on scrape intervals and number of time\u2011series, but for reference:<\/p>\n<ul>\n<li>For <strong>10\u201350 servers<\/strong> with basic Node Exporter + a few application exporters, a 2\u20134 vCPU VPS with 8\u201316 GB RAM and fast SSD\/NVMe storage is usually enough for Prometheus, Grafana and a small Zabbix instance.<\/li>\n<li>For <strong>50\u2013200 servers<\/strong> plus network gear, separate your stack:\n<ul>\n<li>One node for Prometheus (+ Alertmanager + Grafana).<\/li>\n<li>One node for Zabbix server and its database.<\/li>\n<\/ul>\n<\/li>\n<li>For <strong>200+ servers<\/strong>, consider Prometheus federation, multiple Zabbix proxies and possibly dedicated database servers for Zabbix.<\/li>\n<\/ul>\n<p>Centralized monitoring traffic is usually modest compared to application traffic, but make sure the monitoring node has enough disk IOPS to handle Prometheus and Zabbix writes. Our <a href=\"https:\/\/www.dchost.com\/blog\/en\/nvme-ssd-sata-ssd-ve-hdd-karsilastirmasi-web-hosting-yedek-ve-arsiv-icin-dogru-disk-secimi\/\">NVMe vs SSD vs HDD guide for hosting<\/a> explains how storage choices impact metrics and log workloads.<\/p>\n<h2><span id=\"Onboarding_Servers_and_Services\">Onboarding Servers and Services<\/span><\/h2>\n<h3><span id=\"Installing_exporters_for_Prometheus\">Installing exporters for Prometheus<\/span><\/h3>\n<p>For Linux VPS and dedicated servers, a typical Prometheus exporter set includes:<\/p>\n<ul>\n<li><strong>Node Exporter<\/strong>: OS metrics (CPU, RAM, disk, filesystem, network, load averages).<\/li>\n<li><strong>Process\/service exporters<\/strong>: e.g. MySQL exporter, PostgreSQL exporter, Redis exporter, Nginx\/Apache exporters.<\/li>\n<li><strong>Blackbox Exporter<\/strong>: HTTP, TCP, ICMP, DNS checks from the monitoring node\u2019s perspective.<\/li>\n<\/ul>\n<p>Each exporter listens on a local TCP port (often 9100 for Node Exporter, 9115 for Blackbox, etc.) and Prometheus is configured with a <code>scrape_config<\/code> listing the targets and labels. We recommend building <strong>service discovery based on host groups or naming conventions<\/strong> so you don\u2019t edit config files every time you add a server.<\/p>\n<h3><span id=\"Deploying_Zabbix_agents_and_proxies\">Deploying Zabbix agents and proxies<\/span><\/h3>\n<p>Zabbix offers two main connection patterns:<\/p>\n<ul>\n<li><strong>Agent active\/passive checks<\/strong>: the agent runs on the host and either connects to the server (active) or listens for requests (passive).<\/li>\n<li><strong>Zabbix proxies<\/strong>: intermediate nodes that collect data from agents and SNMP devices, then relay it to the main Zabbix server.<\/li>\n<\/ul>\n<p>For distributed environments with multiple locations or restricted networks, proxies simplify firewall rules and reduce load on the central server. Typical use cases:<\/p>\n<ul>\n<li>A Zabbix proxy in each data center \/ rack collecting SNMP from switches and agents from local servers.<\/li>\n<li>One proxy per customer network in agency scenarios, reporting back to a central Zabbix server at dchost.com.<\/li>\n<\/ul>\n<p>Templates in Zabbix (for Linux, Windows, MySQL, Nginx, etc.) make onboarding faster; they create items, triggers and graphs automatically when you add a host.<\/p>\n<h3><span id=\"Monitoring_network_devices_and_hardware\">Monitoring network devices and hardware<\/span><\/h3>\n<p>Prometheus is excellent for applications, but classic network and hardware monitoring is still more convenient with Zabbix:<\/p>\n<ul>\n<li>Use <strong>SNMP<\/strong> templates for switches, routers, firewalls, load balancers, UPS and PDU units.<\/li>\n<li>Monitor <strong>interfaces, errors, dropped packets, bandwidth usage<\/strong> and hardware sensors (temperature, fans, power).<\/li>\n<li>Use <strong>ICMP ping checks<\/strong> (with dependencies) so that one failed upstream router doesn\u2019t generate hundreds of downstream host alerts.<\/li>\n<\/ul>\n<p>For physical servers in colocation racks, we often use Zabbix IPMI or vendor\u2011specific agents (where available) to track hardware alerts that don\u2019t surface at the OS level.<\/p>\n<h3><span id=\"Combining_uptime_checks_with_deeper_metrics\">Combining uptime checks with deeper metrics<\/span><\/h3>\n<p>Uptime checks (is port 443 answering?) are useful but not enough. A page may be &#8220;up&#8221; while database queries are timing out. We usually combine:<\/p>\n<ul>\n<li><strong>HTTP\/HTTPS probes<\/strong> via Blackbox Exporter (Prometheus) and\/or simple Zabbix web scenarios.<\/li>\n<li><strong>Application metrics<\/strong> like request rate, error rate, latency histograms.<\/li>\n<li><strong>Resource metrics<\/strong> like CPU saturation, cache hit ratios, DB connections.<\/li>\n<\/ul>\n<p>If you need a lightweight external uptime monitor for public status pages, we covered that separately in our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/kendi-status-pageinizi-kurun-uptime-kuma-ile-uptime-izleme-ve-kesinti-iletisimi\/\">setting up your own status page with Uptime Kuma<\/a>. In this article, we focus on the deeper internal metrics layer.<\/p>\n<h2><span id=\"Designing_Useful_Dashboards_and_Alerts\">Designing Useful Dashboards and Alerts<\/span><\/h2>\n<h3><span id=\"Grafana_as_the_shared_observability_layer\">Grafana as the shared observability layer<\/span><\/h3>\n<p>Once Prometheus and Zabbix are collecting data, Grafana becomes your shared window into the system. We recommend:<\/p>\n<ul>\n<li>Creating <strong>role\u2011based dashboards<\/strong>:\n<ul>\n<li>&#8220;Ops overview&#8221;: infrastructure health across all regions and services.<\/li>\n<li>&#8220;Application team&#8221; dashboards: metrics tied to a specific product or microservice.<\/li>\n<li>&#8220;Management&#8221; views: high\u2011level uptime, SLA compliance and capacity trends.<\/li>\n<\/ul>\n<\/li>\n<li>Using <strong>variables<\/strong> (drop\u2011downs) for selecting environments, clusters, hosts and time ranges.<\/li>\n<li>Mixing <strong>Prometheus and Zabbix panels<\/strong> in the same dashboard where appropriate (e.g. application metrics from Prometheus, interface health from Zabbix).<\/li>\n<\/ul>\n<p>Grafana also supports annotations; you can mark deployments, configuration changes or incidents on the timeline to correlate with metric changes.<\/p>\n<h3><span id=\"Where_to_put_alert_logic_Prometheus_Zabbix_or_Grafana\">Where to put alert logic: Prometheus, Zabbix or Grafana?<\/span><\/h3>\n<p>There are three main options for alerting in this architecture:<\/p>\n<ol>\n<li><strong>Prometheus + Alertmanager<\/strong> for time\u2011series alerting:\n<ul>\n<li>Use PromQL alert rules (e.g. high CPU over 5 minutes, error rate spikes, SLO violations).<\/li>\n<li>Route alerts by labels (service, severity, team) to email, chat or webhooks.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Zabbix triggers<\/strong> for SNMP\/agent\u2011based alerts:\n<ul>\n<li>Use templates to define host\u2011class\u2011specific thresholds.<\/li>\n<li>Use escalations and dependencies for more advanced flows.<\/li>\n<\/ul>\n<\/li>\n<li><strong>Grafana alerts<\/strong> (optional):\n<ul>\n<li>Useful when you want alert rules that span multiple data sources.<\/li>\n<li>Can be configured directly from existing dashboard panels.<\/li>\n<\/ul>\n<\/li>\n<\/ol>\n<p>Our usual pattern:<\/p>\n<ul>\n<li>Keep <strong>infrastructure and application SLO alerts<\/strong> in Prometheus\/Alertmanager.<\/li>\n<li>Keep <strong>device and inventory\u2011centric alerts<\/strong> (SNMP, agent checks, disks, power, temperature) in Zabbix.<\/li>\n<li>Use Grafana alerts sparingly, usually for cross\u2011source checks or business\u2011level indicators.<\/li>\n<\/ul>\n<h3><span id=\"Avoiding_alert_fatigue\">Avoiding alert fatigue<\/span><\/h3>\n<p>A noisy alert system is as bad as no monitoring at all. Concrete tips:<\/p>\n<ul>\n<li>Start with <strong>a small, high\u2011value set<\/strong> of alerts: host down, disk almost full, HTTP 5xx spike, DB latency, Redis saturation.<\/li>\n<li>Use <strong>for<\/strong> durations (e.g. CPU &gt; 90% for 5 minutes) instead of alerting on every spike.<\/li>\n<li>Implement <strong>silences and maintenance windows<\/strong> during planned work.<\/li>\n<li>Group alerts by service or cluster to avoid a flood when an upstream dependency fails.<\/li>\n<\/ul>\n<p>When you integrate log\u2011based alerts later (e.g. with Loki), make sure they complement, not duplicate, metric alerts. Our article on <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-log-yonetimi-nasil-rayina-oturur-grafana-loki-promtail-ile-merkezi-loglama-tutma-sureleri-ve-alarm-kurallari\/\">centralized VPS log management with Grafana Loki and Promtail<\/a> shows how we approach log alerts without overwhelming teams.<\/p>\n<h2><span id=\"Integrating_Logs_Metrics_and_Uptime\">Integrating Logs, Metrics and Uptime<\/span><\/h2>\n<h3><span id=\"Why_logs_still_matter\">Why logs still matter<\/span><\/h3>\n<p>Metrics tell you <strong>that<\/strong> something is wrong; logs often tell you <strong>why<\/strong>. A mature observability stack typically includes:<\/p>\n<ul>\n<li><strong>Metrics<\/strong>: Prometheus (system and application metrics).<\/li>\n<li><strong>Events\/alerts<\/strong>: Prometheus Alertmanager + Zabbix triggers.<\/li>\n<li><strong>Logs<\/strong>: Loki, ELK or similar, often visualized in Grafana.<\/li>\n<li><strong>Uptime checks<\/strong>: external and internal HTTP\/TCP checks.<\/li>\n<\/ul>\n<p>We frequently pair the Prometheus + Zabbix + Grafana stack with either Loki or ELK for logs. For hosting environments with many VPS and sites, we summarized patterns in our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/birden-fazla-sunucuda-log-yonetimi-elk-ve-loki-stack-ile-merkezi-hosting-loglama\/\">centralizing logs from multiple servers using ELK and Loki<\/a>.<\/p>\n<h3><span id=\"Endtoend_flow_during_an_incident\">End\u2011to\u2011end flow during an incident<\/span><\/h3>\n<p>In a well\u2011designed centralized architecture, a typical production issue looks like this from the operator\u2019s perspective:<\/p>\n<ol>\n<li>Alertmanager sends a <strong>high error\u2011rate alert<\/strong> for the checkout service, pointing to a Grafana dashboard.<\/li>\n<li>In Grafana, you see\n<ul>\n<li>HTTP 5xx rate increased, response time jumped.<\/li>\n<li>DB latency increased at the same time.<\/li>\n<li>CPU and RAM are normal, but disk I\/O is high.<\/li>\n<\/ul>\n<\/li>\n<li>You jump to the <strong>logs panel<\/strong> (same Grafana, Loki data source) filtered for that service and time range.<\/li>\n<li>Log traces show lock wait timeouts on a specific table; a recent deployment added a heavy query.<\/li>\n<li>You roll back the change, confirm that error rates and DB latency return to normal.<\/li>\n<\/ol>\n<p>Because all signals are centralized and linked, the incident is more about reading a story than hunting across five tools.<\/p>\n<h2><span id=\"Practical_Implementation_Steps_on_VPS_or_Dedicated_Servers\">Practical Implementation Steps on VPS or Dedicated Servers<\/span><\/h2>\n<h3><span id=\"1_Choose_and_prepare_the_monitoring_host\">1. Choose and prepare the monitoring host<\/span><\/h3>\n<p>Start with a <strong>dedicated monitoring VPS or server<\/strong> at dchost.com, sized according to your fleet (see sizing notes above). On this host:<\/p>\n<ul>\n<li>Harden the OS (updates, firewall, non\u2011root SSH). Our general <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-sunucu-guvenligi-pratik-olceklenebilir-ve-dogrulanabilir-yaklasimlar\/\">VPS security hardening checklist<\/a> is a good baseline.<\/li>\n<li>Ensure correct <strong>timezone and NTP sync<\/strong> so metrics and logs align; our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/sunucu-saat-dilimi-ve-ntp-ayarlari-loglar-cron-joblar-ve-cok-bolgeli-hosting-icin-pratik-rehber\/\">server timezone and NTP configuration<\/a> explains why this matters for reliable monitoring.<\/li>\n<li>Plan disks with enough space for Prometheus TSDB and Zabbix database retention.<\/li>\n<\/ul>\n<h3><span id=\"2_Install_Prometheus_Alertmanager_and_Grafana\">2. Install Prometheus, Alertmanager and Grafana<\/span><\/h3>\n<p>On the monitoring host:<\/p>\n<ul>\n<li>Install <strong>Prometheus<\/strong> and configure basic scrape jobs (self\u2011monitoring plus a couple of test hosts).<\/li>\n<li>Install <strong>Alertmanager<\/strong> and set up a minimal alert route to email or chat.<\/li>\n<li>Install <strong>Grafana<\/strong>, secure it with strong admin credentials and TLS, and add Prometheus as a data source.<\/li>\n<li>Import or create initial dashboards (e.g. &#8220;Node overview&#8221;, &#8220;MySQL overview&#8221;).<\/li>\n<\/ul>\n<p>If you prefer a more guided first setup, our article on <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-izleme-ve-alarm-kurulumu-prometheus-grafana-ve-uptime-kuma-ile-baslangic\/\">getting started with Prometheus and Grafana for VPS monitoring<\/a> walks through a minimal but production\u2011friendly configuration.<\/p>\n<h3><span id=\"3_Install_Zabbix_server_and_connect_it_to_Grafana\">3. Install Zabbix server and connect it to Grafana<\/span><\/h3>\n<p>Next, install <strong>Zabbix server<\/strong> (and a database) on the same host or a separate one, depending on your scale. Then:<\/p>\n<ul>\n<li>Set up the Zabbix front\u2011end over HTTPS.<\/li>\n<li>Create host groups reflecting your environment (e.g. &#8220;web\u2011prod&#8221;, &#8220;db\u2011prod&#8221;, &#8220;network\u2011core&#8221;).<\/li>\n<li>Deploy Zabbix agents to a few test servers and link them to appropriate templates.<\/li>\n<li>In Grafana, install and configure the <strong>Zabbix data source plugin<\/strong> so Zabbix metrics are available alongside Prometheus.<\/li>\n<\/ul>\n<h3><span id=\"4_Roll_out_exporters_and_agents_across_your_fleet\">4. Roll out exporters and agents across your fleet<\/span><\/h3>\n<p>Once core components work, onboard the rest of your infrastructure:<\/p>\n<ul>\n<li>Automate Node Exporter and other exporters deployment via Ansible, scripts or images.<\/li>\n<li>Define Prometheus <code>scrape_config<\/code> blocks per role (web, db, cache, worker) using labels, not hard\u2011coded hostnames where possible.<\/li>\n<li>Roll out Zabbix agents and\/or SNMP templates to servers and network devices.<\/li>\n<li>Gradually enable templates and alerts, starting with non\u2011critical warnings to avoid noise.<\/li>\n<\/ul>\n<h3><span id=\"5_Build_and_iterate_dashboards_and_alert_rules\">5. Build and iterate dashboards and alert rules<\/span><\/h3>\n<p>With data flowing, sit down with operations and development teams to design dashboards and alerts that match how you actually work:<\/p>\n<ul>\n<li>Start from real incidents you\u2019ve had in the past and design <strong>signals that would have revealed them early<\/strong>.<\/li>\n<li>Define SLOs\/SLAs where relevant (e.g. 99.9% uptime, 95th percentile latency) and create corresponding Prometheus alerts.<\/li>\n<li>Review alert noise after a few weeks; tune thresholds, groupings and durations.<\/li>\n<\/ul>\n<p>Monitoring is not &#8220;set and forget&#8221;; it\u2019s an evolving part of your hosting architecture, just like backups and security.<\/p>\n<h2><span id=\"Security_MultiTenancy_and_Access_Control\">Security, Multi\u2011Tenancy and Access Control<\/span><\/h2>\n<h3><span id=\"Securing_data_paths\">Securing data paths<\/span><\/h3>\n<p>Monitoring systems have a lot of sensitive information: IPs, hostnames, internal URLs, sometimes even business metrics. Protect them by:<\/p>\n<ul>\n<li>Restricting exporter and agent ports via host\u2011level firewalls or network ACLs.<\/li>\n<li>Using <strong>mutual TLS (mTLS) or VPN<\/strong> for connections across untrusted networks.<\/li>\n<li>Enabling role\u2011based access in Grafana and Zabbix, so each team only sees what they should.<\/li>\n<li>Backing up configuration and dashboards securely, along with the rest of your hosting backups.<\/li>\n<\/ul>\n<h3><span id=\"Agency_and_multitenant_scenarios\">Agency and multi\u2011tenant scenarios<\/span><\/h3>\n<p>If you are an agency or a team managing multiple client environments on dchost.com, a centralized monitoring stack is especially valuable:<\/p>\n<ul>\n<li>Group clients by folders\/teams in Grafana and host groups in Zabbix.<\/li>\n<li>Use labels in Prometheus (e.g. <code>tenant=\"client-a\"<\/code>) to filter dashboards and alerts.<\/li>\n<li>Expose read\u2011only Grafana dashboards per client if needed, while keeping write access internal.<\/li>\n<\/ul>\n<p>This model lines up well with the way we design <a href=\"https:\/\/www.dchost.com\/blog\/en\/ajanslar-icin-musteri-sitelerini-izleme-mimarisi-uptime-ssl-ve-domain-alarm-sistemi\/\">monitoring for client websites at scale for agencies<\/a>, where SSL expiry, domain renewal and uptime checks are also centralized.<\/p>\n<h2><span id=\"How_We_Apply_This_Stack_at_dchostcom\">How We Apply This Stack at dchost.com<\/span><\/h2>\n<h3><span id=\"Typical_realworld_scenario\">Typical real\u2011world scenario<\/span><\/h3>\n<p>Let\u2019s take a common example we see with customers:<\/p>\n<ul>\n<li>5\u201310 production VPS for web, app and database roles.<\/li>\n<li>1\u20132 dedicated servers as storage or high\u2011traffic database nodes.<\/li>\n<li>A rack or colocation setup with switches, firewalls and a few physical servers.<\/li>\n<\/ul>\n<p>We usually deploy:<\/p>\n<ul>\n<li>One <strong>central monitoring VPS<\/strong> in the same region with Prometheus, Alertmanager, Grafana and (for this size) Zabbix server.<\/li>\n<li>Node Exporter + service exporters on all Linux servers; Zabbix agents on both Linux and Windows where needed.<\/li>\n<li>SNMP monitoring for the network devices in colocation.<\/li>\n<li>Grafana dashboards organized by environment (prod\/stage) and system type (web, db, network).<\/li>\n<li>Alert rules focused on host down, disk thresholds, HTTP 5xx spikes, DB saturation and SSL expiry.<\/li>\n<\/ul>\n<p>From there, we iterate: add more application\u2011specific metrics, refine SLOs, integrate log data, and adjust retention as data and teams grow.<\/p>\n<h3><span id=\"Why_host_monitoring_on_separate_infrastructure\">Why host monitoring on separate infrastructure?<\/span><\/h3>\n<p>We highly recommend running your centralized monitoring <strong>on its own VPS or server<\/strong> instead of mixing it into an application node. Advantages:<\/p>\n<ul>\n<li>Monitoring stays up while production servers are being rebooted, migrated or scaled.<\/li>\n<li>Resource spikes on your apps don\u2019t starve Prometheus or Zabbix.<\/li>\n<li>Security boundaries are clearer: you can lock down monitoring access separately.<\/li>\n<\/ul>\n<p>At dchost.com we size and place monitoring VPSs specifically for this role, whether your main workloads are on our shared hosting, VPS, dedicated or colocation platforms.<\/p>\n<h2><span id=\"Conclusion_Building_a_Monitoring_Foundation_You_Can_Trust\">Conclusion: Building a Monitoring Foundation You Can Trust<\/span><\/h2>\n<p>A robust centralized server monitoring and alerting architecture is not a luxury; it is part of the foundation of reliable hosting. By combining <strong>Prometheus<\/strong> for time\u2011series metrics, <strong>Zabbix<\/strong> for agent\/SNMP and inventory\u2011centric monitoring, and <strong>Grafana<\/strong> as a unified visualization and optional alerting layer, you get the best of each tool without locking yourself into a single mindset or workflow.<\/p>\n<p>Start small: a dedicated monitoring VPS, Node Exporter on a few servers, a Zabbix server with basic templates, and a handful of meaningful alerts. Then grow deliberately: add exporters, proxies, log integration and more sophisticated SLO\u2011based rules as your environment expands. If you\u2019d like help designing or hosting such a stack\u2014whether you run a handful of VPS, several dedicated servers or a full colocation footprint\u2014our team at dchost.com can size the right monitoring host, configure Prometheus, Grafana and Zabbix, and integrate them with your existing infrastructure so you have a monitoring platform you can rely on for years.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>When you manage more than a handful of servers, &#8220;logging in and checking top&#8221; stops being a monitoring strategy. You need a single, reliable place where CPU, RAM, disk, network, database, HTTP checks and hardware metrics come together; where alerts are consistent; and where teams see the same truth. In real hosting environments, that usually [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4804,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-4803","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-teknoloji"],"_links":{"self":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/4803","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/comments?post=4803"}],"version-history":[{"count":0,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/4803\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media\/4804"}],"wp:attachment":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media?parent=4803"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/categories?post=4803"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/tags?post=4803"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}