{"id":3457,"date":"2025-12-26T22:45:49","date_gmt":"2025-12-26T19:45:49","guid":{"rendered":"https:\/\/www.dchost.com\/blog\/rate-limiting-strategies-for-apis-and-microservices-with-nginx-cloudflare-and-redis\/"},"modified":"2025-12-26T22:45:49","modified_gmt":"2025-12-26T19:45:49","slug":"rate-limiting-strategies-for-apis-and-microservices-with-nginx-cloudflare-and-redis","status":"publish","type":"post","link":"https:\/\/www.dchost.com\/blog\/en\/rate-limiting-strategies-for-apis-and-microservices-with-nginx-cloudflare-and-redis\/","title":{"rendered":"Rate Limiting Strategies for APIs and Microservices with Nginx, Cloudflare and Redis"},"content":{"rendered":"<div class=\"dchost-blog-content-wrapper\"><p>When you start exposing APIs or microservices to the outside world, one of the first real-world problems you hit is traffic that doesn\u2019t behave nicely. Sometimes it is honest heavy usage from one customer, sometimes an integration bug that loops requests, and sometimes outright malicious scanning or credential stuffing. In all of these cases, you need a way to say \u201cenough for now\u201d without breaking the rest of your platform. That is exactly what <strong>rate limiting<\/strong> does.<\/p>\n<p>In this article, we will walk through how we design and implement rate limiting for API and microservice architectures using three battle\u2011tested building blocks: <strong>Nginx<\/strong> as the gateway, <strong>Cloudflare<\/strong> at the edge, and <strong>Redis<\/strong> as a fast shared counter store. I will focus on practical patterns that we actually use in production\u2011style environments on <a href=\"https:\/\/www.dchost.com\/vps\">VPS<\/a> and <a href=\"https:\/\/www.dchost.com\/dedicated-server\">dedicated server<\/a>s at dchost.com: what to limit, where to enforce it, sample Nginx configs, Redis scripts, and how to tune thresholds without constantly fighting false positives.<\/p>\n<p>We will also look at how rate limiting sits next to other pieces you might already be running: WAF rules, DDoS mitigation, caching, and firewalls. The goal is to help you build a layered but understandable traffic control strategy that protects your APIs and microservices while still feeling fast and predictable for legitimate users.<\/p>\n<div id=\"toc_container\" class=\"toc_transparent no_bullets\"><p class=\"toc_title\">\u0130&ccedil;indekiler<\/p><ul class=\"toc_list\"><li><a href=\"#Why_Rate_Limiting_Matters_for_APIs_and_Microservices\"><span class=\"toc_number toc_depth_1\">1<\/span> Why Rate Limiting Matters for APIs and Microservices<\/a><ul><li><a href=\"#Key_goals_of_rate_limiting\"><span class=\"toc_number toc_depth_2\">1.1<\/span> Key goals of rate limiting<\/a><\/li><li><a href=\"#How_it_fits_with_WAF_DDoS_and_firewalls\"><span class=\"toc_number toc_depth_2\">1.2<\/span> How it fits with WAF, DDoS and firewalls<\/a><\/li><\/ul><\/li><li><a href=\"#Core_Rate_Limiting_Concepts_and_Algorithms\"><span class=\"toc_number toc_depth_1\">2<\/span> Core Rate Limiting Concepts and Algorithms<\/a><ul><li><a href=\"#Basic_terminology\"><span class=\"toc_number toc_depth_2\">2.1<\/span> Basic terminology<\/a><\/li><li><a href=\"#Main_rate_limiting_algorithms\"><span class=\"toc_number toc_depth_2\">2.2<\/span> Main rate limiting algorithms<\/a><\/li><\/ul><\/li><li><a href=\"#Designing_a_Rate_Limiting_Strategy_for_Your_Architecture\"><span class=\"toc_number toc_depth_1\">3<\/span> Designing a Rate Limiting Strategy for Your Architecture<\/a><ul><li><a href=\"#Choosing_the_right_key_IP_user_or_tenant\"><span class=\"toc_number toc_depth_2\">3.1<\/span> Choosing the right key: IP, user or tenant?<\/a><\/li><li><a href=\"#Where_to_enforce_limits_in_a_microservices_stack\"><span class=\"toc_number toc_depth_2\">3.2<\/span> Where to enforce limits in a microservices stack<\/a><\/li><li><a href=\"#How_to_respond_429_and_RetryAfter\"><span class=\"toc_number toc_depth_2\">3.3<\/span> How to respond: 429 and Retry\u2011After<\/a><\/li><\/ul><\/li><li><a href=\"#Implementing_Rate_Limiting_with_Nginx\"><span class=\"toc_number toc_depth_1\">4<\/span> Implementing Rate Limiting with Nginx<\/a><ul><li><a href=\"#Basic_perIP_rate_limiting\"><span class=\"toc_number toc_depth_2\">4.1<\/span> Basic per\u2011IP rate limiting<\/a><\/li><li><a href=\"#Rate_limiting_by_API_key_or_user_ID\"><span class=\"toc_number toc_depth_2\">4.2<\/span> Rate limiting by API key or user ID<\/a><\/li><li><a href=\"#Different_limits_per_endpoint\"><span class=\"toc_number toc_depth_2\">4.3<\/span> Different limits per endpoint<\/a><\/li><li><a href=\"#Protecting_login_and_XMLRPC_style_endpoints\"><span class=\"toc_number toc_depth_2\">4.4<\/span> Protecting login and XML\u2011RPC style endpoints<\/a><\/li><li><a href=\"#Moving_beyond_a_single_node_Nginx_Redis\"><span class=\"toc_number toc_depth_2\">4.5<\/span> Moving beyond a single node: Nginx + Redis<\/a><\/li><\/ul><\/li><li><a href=\"#Cloudflare_Rate_Limiting_and_WAF_Rules_at_the_Edge\"><span class=\"toc_number toc_depth_1\">5<\/span> Cloudflare Rate Limiting and WAF Rules at the Edge<\/a><ul><li><a href=\"#Layering_edge_rules_in_front_of_your_origin\"><span class=\"toc_number toc_depth_2\">5.1<\/span> Layering edge rules in front of your origin<\/a><\/li><li><a href=\"#Typical_Cloudflare_rules_for_APIs\"><span class=\"toc_number toc_depth_2\">5.2<\/span> Typical Cloudflare rules for APIs<\/a><\/li><li><a href=\"#Working_with_429s_and_client_behavior\"><span class=\"toc_number toc_depth_2\">5.3<\/span> Working with 429s and client behavior<\/a><\/li><\/ul><\/li><li><a href=\"#Redis-Based_Distributed_Rate_Limiting_for_APIs_and_Microservices\"><span class=\"toc_number toc_depth_1\">6<\/span> Redis-Based Distributed Rate Limiting for APIs and Microservices<\/a><ul><li><a href=\"#Why_Redis_works_so_well_for_rate_limiting\"><span class=\"toc_number toc_depth_2\">6.1<\/span> Why Redis works so well for rate limiting<\/a><\/li><li><a href=\"#Simple_fixed-window_limit_with_INCR_EXPIRE\"><span class=\"toc_number toc_depth_2\">6.2<\/span> Simple fixed-window limit with INCR + EXPIRE<\/a><\/li><li><a href=\"#Token_bucket_with_Lua_scripting\"><span class=\"toc_number toc_depth_2\">6.3<\/span> Token bucket with Lua scripting<\/a><\/li><li><a href=\"#Concurrency_limits_to_protect_slow_dependencies\"><span class=\"toc_number toc_depth_2\">6.4<\/span> Concurrency limits to protect slow dependencies<\/a><\/li><\/ul><\/li><li><a href=\"#Observability_Testing_and_Tuning_Your_Limits\"><span class=\"toc_number toc_depth_1\">7<\/span> Observability, Testing and Tuning Your Limits<\/a><ul><li><a href=\"#Logging_and_metrics_for_rate_limits\"><span class=\"toc_number toc_depth_2\">7.1<\/span> Logging and metrics for rate limits<\/a><\/li><li><a href=\"#Dry_runs_and_soft_enforcement\"><span class=\"toc_number toc_depth_2\">7.2<\/span> Dry runs and soft enforcement<\/a><\/li><li><a href=\"#Choosing_sensible_numbers\"><span class=\"toc_number toc_depth_2\">7.3<\/span> Choosing sensible numbers<\/a><\/li><\/ul><\/li><li><a href=\"#Network-Level_Rate_Limiting_with_Firewalls\"><span class=\"toc_number toc_depth_1\">8<\/span> Network-Level Rate Limiting with Firewalls<\/a><\/li><li><a href=\"#Putting_It_All_Together_on_dchostcom_Infrastructure\"><span class=\"toc_number toc_depth_1\">9<\/span> Putting It All Together on dchost.com Infrastructure<\/a><ul><li><a href=\"#A_reference_architecture_for_API_rate_limiting\"><span class=\"toc_number toc_depth_2\">9.1<\/span> A reference architecture for API rate limiting<\/a><\/li><li><a href=\"#Scaling_up_VPS_dedicated_and_colocation_options\"><span class=\"toc_number toc_depth_2\">9.2<\/span> Scaling up: VPS, dedicated and colocation options<\/a><\/li><li><a href=\"#Next_steps\"><span class=\"toc_number toc_depth_2\">9.3<\/span> Next steps<\/a><\/li><\/ul><\/li><\/ul><\/div>\n<h2><span id=\"Why_Rate_Limiting_Matters_for_APIs_and_Microservices\">Why Rate Limiting Matters for APIs and Microservices<\/span><\/h2>\n<p>Rate limiting is simply controlling <strong>how many requests<\/strong> a given actor (IP, user, API key, tenant, etc.) can make over a specific period of time. Done right, it acts like a safety valve for your infrastructure and your business logic.<\/p>\n<h3><span id=\"Key_goals_of_rate_limiting\">Key goals of rate limiting<\/span><\/h3>\n<ul>\n<li><strong>Protect backend capacity:<\/strong> Databases, queues and external providers all have limits. Rate limiting keeps noisy neighbors from exhausting them.<\/li>\n<li><strong>Fairness between tenants:<\/strong> In multi\u2011tenant SaaS, you do not want one large customer to slow down everyone else.<\/li>\n<li><strong>Abuse and fraud prevention:<\/strong> Limiting login attempts, password reset flows, search endpoints and pricing APIs slows down attackers and scrapers.<\/li>\n<li><strong>Cost control:<\/strong> If you pay per external API call or per CPU second, rate limits cap the financial blast radius of bugs and misuse.<\/li>\n<li><strong>SLO and SLA protection:<\/strong> Keeping tail latencies under control is much easier if you cap the worst offenders before they cause a chain reaction.<\/li>\n<\/ul>\n<h3><span id=\"How_it_fits_with_WAF_DDoS_and_firewalls\">How it fits with WAF, DDoS and firewalls<\/span><\/h3>\n<p>Rate limiting is one layer in a broader defense\u2011in\u2011depth strategy:<\/p>\n<ul>\n<li><strong>Network\u2011level protection<\/strong> (firewalls, basic DDoS filtering) blocks obvious floods or port scans.<\/li>\n<li><strong>WAF rules<\/strong> block malicious patterns in payloads (SQL injection, XSS signatures, etc.). For a practical overview, our <a href=\"https:\/\/www.dchost.com\/blog\/en\/cloudflare-guvenlik-ayarlari-rehberi-kucuk-isletme-siteleri-icin-waf-rate-limit-ve-bot-korumasi\/\">Cloudflare security settings guide on WAF, rate limiting and bot protection<\/a> shows how we layer these at the edge.<\/li>\n<li><strong>Rate limiting<\/strong> focuses on <em>volume<\/em> and <em>frequency<\/em> of otherwise valid\u2011looking requests.<\/li>\n<\/ul>\n<p>In microservice environments, rate limiting is often what keeps a noisy client from pushing a shared dependency (like a database or a payment gateway) over the edge and taking other services down with it.<\/p>\n<h2><span id=\"Core_Rate_Limiting_Concepts_and_Algorithms\">Core Rate Limiting Concepts and Algorithms<\/span><\/h2>\n<p>Before we touch Nginx, Cloudflare or Redis, it helps to have a clear vocabulary and mental model.<\/p>\n<h3><span id=\"Basic_terminology\">Basic terminology<\/span><\/h3>\n<ul>\n<li><strong>Limit:<\/strong> Maximum number of requests allowed within a time window (for example, 100 requests per minute).<\/li>\n<li><strong>Window:<\/strong> The time period for the limit (per second, per minute, per hour, per day).<\/li>\n<li><strong>Key:<\/strong> The identifier you are limiting: IP, user ID, API key, tenant ID, or a combination.<\/li>\n<li><strong>Burst:<\/strong> Extra requests temporarily allowed above the limit to smooth out short spikes.<\/li>\n<li><strong>Quota vs rate:<\/strong> Some APIs control <em>total<\/em> requests per day\/month (quota) as well as requests per second\/minute (rate).<\/li>\n<\/ul>\n<h3><span id=\"Main_rate_limiting_algorithms\">Main rate limiting algorithms<\/span><\/h3>\n<p>Most implementations are variants of a few well\u2011known algorithms:<\/p>\n<ul>\n<li><strong>Fixed window:<\/strong> Count requests in discrete windows (e.g. 00:00\u201300:59, 01:00\u201301:59). Simple but can have edge\u2011of\u2011window bursts.<\/li>\n<li><strong>Sliding window:<\/strong> Keep timestamps of recent requests and count those within the last N seconds. Smoother but needs more storage.<\/li>\n<li><strong>Token bucket:<\/strong> Tokens accumulate over time; each request consumes one. Allows bursts up to bucket size while enforcing a long\u2011term average rate.<\/li>\n<li><strong>Leaky bucket:<\/strong> Like a bucket that leaks at a constant rate; if more water (requests) enters than can leak out, extra is dropped.<\/li>\n<li><strong>Concurrency limiting:<\/strong> Limit how many <em>in\u2011flight<\/em> requests or jobs a key can have at the same time, protecting slow dependencies.<\/li>\n<\/ul>\n<p>Nginx\u2019s built\u2011in <code>limit_req<\/code> is effectively a token bucket. Cloudflare offers rules based on request rate and concurrency. Redis lets you implement any of the above by combining counters, TTLs and Lua scripts.<\/p>\n<h2><span id=\"Designing_a_Rate_Limiting_Strategy_for_Your_Architecture\">Designing a Rate Limiting Strategy for Your Architecture<\/span><\/h2>\n<p>Good rate limiting is more than just picking a number and copying a config snippet. You need to think about <strong>who<\/strong> you\u2019re protecting, <strong>what<\/strong> you\u2019re protecting, and <strong>where<\/strong> you enforce limits.<\/p>\n<h3><span id=\"Choosing_the_right_key_IP_user_or_tenant\">Choosing the right key: IP, user or tenant?<\/span><\/h3>\n<p>Each key type has trade\u2011offs:<\/p>\n<ul>\n<li><strong>IP address:<\/strong> Easy to implement at the edge (Nginx, Cloudflare). Works well for anonymous endpoints and brute\u2011force protection. Weak for mobile users behind NAT or large corporate proxies.<\/li>\n<li><strong>API key \/ client ID:<\/strong> Best for authenticated APIs. Fair per\u2011client limits; not affected by NAT. Requires that authentication happens before rate limiting.<\/li>\n<li><strong>User ID:<\/strong> Useful for consumer\u2011facing apps where many users may share an IP. Often combined with a coarser IP\u2011based limit as a backup.<\/li>\n<li><strong>Tenant \/ organization ID:<\/strong> Critical in B2B SaaS to enforce plan\u2011based quotas (for example, 10k requests\/day for the Basic plan, 1M for Enterprise).<\/li>\n<\/ul>\n<p>In practice, we often layer them. For example:<\/p>\n<ul>\n<li>Edge rate limiting at Cloudflare by IP to block obvious floods.<\/li>\n<li>Gateway rate limiting at Nginx by <code>$api_client_id<\/code> or <code>$jwt_sub<\/code>.<\/li>\n<li>Business\u2011level quotas enforced inside the application using Redis counters.<\/li>\n<\/ul>\n<h3><span id=\"Where_to_enforce_limits_in_a_microservices_stack\">Where to enforce limits in a microservices stack<\/span><\/h3>\n<p>A typical dchost.com\u2011style API stack might look like this:<\/p>\n<ol>\n<li><strong>Client \u2192 Cloudflare:<\/strong> Global edge network, WAF and edge rate limiting.<\/li>\n<li><strong>Cloudflare \u2192 Nginx gateway on your VPS\/dedicated server:<\/strong> TLS termination, routing, Nginx rate limiting and microcaching.<\/li>\n<li><strong>Nginx \u2192 backend microservices:<\/strong> RPC\/HTTP calls to internal services, each of which may also have local or centralized rate limits.<\/li>\n<\/ol>\n<p>Rate limits can live in all three layers:<\/p>\n<ul>\n<li><strong>At the edge (Cloudflare):<\/strong> Protects origin bandwidth and CPU; cheapest place to drop junk traffic.<\/li>\n<li><strong>At the gateway (Nginx):<\/strong> More context\u2011aware (paths, headers, auth) and closer to your app logic.<\/li>\n<li><strong>Inside services (Redis\u2011backed):<\/strong> Fine\u2011grained quotas per user, per feature, per tenant; shared across instances.<\/li>\n<\/ul>\n<h3><span id=\"How_to_respond_429_and_RetryAfter\">How to respond: 429 and Retry\u2011After<\/span><\/h3>\n<p>For HTTP APIs, the standard way to signal rate limiting is:<\/p>\n<ul>\n<li><strong>Status code:<\/strong> <code>429 Too Many Requests<\/code><\/li>\n<li><strong>Header:<\/strong> <code>Retry-After: &lt;seconds&gt;<\/code> or an HTTP date<\/li>\n<\/ul>\n<p>Many SDKs and API clients know how to react to 429 automatically (backoff and retry). If you return 500 or 503 instead, clients cannot distinguish rate limiting from real errors, and they may retry too aggressively.<\/p>\n<h2><span id=\"Implementing_Rate_Limiting_with_Nginx\">Implementing Rate Limiting with Nginx<\/span><\/h2>\n<p>Nginx is our go\u2011to gateway and reverse proxy for APIs and microservices. Its native rate limiting is efficient and, when combined with a shared memory zone, works nicely on a single VPS or even across multiple worker processes.<\/p>\n<h3><span id=\"Basic_perIP_rate_limiting\">Basic per\u2011IP rate limiting<\/span><\/h3>\n<p>Here is a minimal example that limits each IP to 10 requests per second with a small burst:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">http {\n    # Define a shared memory zone &quot;api_limit&quot; with 10MB of storage\n    limit_req_zone $binary_remote_addr zone=api_limit:10m rate=10r\/s;\n\n    server {\n        listen 80;\n        server_name api.example.com;\n\n        location \/v1\/ {\n            limit_req zone=api_limit burst=20 nodelay;\n\n            proxy_pass http:\/\/backend_api;\n        }\n    }\n}\n<\/code><\/pre>\n<p>Notes:<\/p>\n<ul>\n<li><strong><code>$binary_remote_addr<\/code><\/strong> is a compact representation of the client IP.<\/li>\n<li><strong><code>rate=10r\/s<\/code><\/strong> sets a base rate of 10 requests per second.<\/li>\n<li><strong><code>burst=20<\/code><\/strong> lets a client briefly exceed the rate, which is often necessary for legitimate short spikes.<\/li>\n<li><strong><code>nodelay<\/code><\/strong> tells Nginx to reject excess requests immediately instead of queuing them.<\/li>\n<\/ul>\n<p>By default, Nginx will return <code>503 Service Unavailable<\/code> when the limit is exceeded. For APIs, you should explicitly configure a 429:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">http {\n    limit_req_status 429;\n}\n<\/code><\/pre>\n<h3><span id=\"Rate_limiting_by_API_key_or_user_ID\">Rate limiting by API key or user ID<\/span><\/h3>\n<p>Per\u2011IP limits are crude. For authenticated APIs, it is far better to limit per client. You can expose a header such as <code>X-API-Client-ID<\/code> from your auth layer (for example, after validating a JWT) and build the zone key from that:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">map $http_x_api_client_id $api_client_id {\n    default $http_x_api_client_id;\n}\n\nlimit_req_zone $api_client_id zone=client_limit:20m rate=100r\/m;\n\nserver {\n    listen 80;\n    server_name api.example.com;\n\n    location \/v1\/ {\n        # Per-client limit: 100 requests per minute\n        limit_req zone=client_limit burst=50 nodelay;\n\n        proxy_pass http:\/\/backend_api;\n    }\n}\n<\/code><\/pre>\n<p>This way, customers on high\u2011latency mobile connections or behind shared IPs are not penalized for each other\u2019s traffic.<\/p>\n<h3><span id=\"Different_limits_per_endpoint\">Different limits per endpoint<\/span><\/h3>\n<p>Some endpoints are much more sensitive than others. For example:<\/p>\n<ul>\n<li><code>POST \/v1\/auth\/login<\/code> and <code>POST \/v1\/password-reset<\/code> should have very low limits.<\/li>\n<li><code>GET \/v1\/products<\/code> can tolerate higher rates.<\/li>\n<li><code>GET \/v1\/reports\/export<\/code> might be limited per hour or per day.<\/li>\n<\/ul>\n<p>With Nginx you can define multiple zones and apply them selectively:<\/p>\n<pre class=\"language-nginx line-numbers\"><code class=\"language-nginx\">limit_req_zone $binary_remote_addr zone=login_limit:5m rate=5r\/m;\nlimit_req_zone $api_client_id      zone=read_limit:10m rate=50r\/s;\n\nserver {\n    listen 80;\n\n    location = \/v1\/auth\/login {\n        limit_req zone=login_limit burst=5 nodelay;\n        proxy_pass http:\/\/auth_service;\n    }\n\n    location \/v1\/ {\n        limit_req zone=read_limit burst=100 nodelay;\n        proxy_pass http:\/\/backend_api;\n    }\n}\n<\/code><\/pre>\n<h3><span id=\"Protecting_login_and_XMLRPC_style_endpoints\">Protecting login and XML\u2011RPC style endpoints<\/span><\/h3>\n<p>For web applications like WordPress, we have seen great results combining Nginx rate limiting with Fail2ban to tame brute\u2011force attacks against login and XML\u2011RPC endpoints. The same pattern works for custom APIs: rate limit at Nginx, log offenders, and optionally ban repeat abusers at the firewall level. We describe this in detail in our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/nginx-rate-limiting-ve-fail2ban-ile-wp%e2%80%91login-php-ve-xml%e2%80%91rpc-brute%e2%80%91force-saldirilarini-nasil-saksiya-alirsin\/\">the calm way to stop wp-login.php and XML-RPC brute force with Nginx rate limiting + Fail2ban<\/a>.<\/p>\n<h3><span id=\"Moving_beyond_a_single_node_Nginx_Redis\">Moving beyond a single node: Nginx + Redis<\/span><\/h3>\n<p>Classic Nginx <code>limit_req<\/code> uses shared memory local to one server. In a horizontally scaled microservice architecture with multiple API gateways, you may want limits to be <strong>global<\/strong> across all nodes.<\/p>\n<p>For that, you can:<\/p>\n<ul>\n<li>Introduce a dedicated <strong>rate limiting service<\/strong> that speaks HTTP\/gRPC and uses Redis under the hood.<\/li>\n<li>Or use Nginx modules (such as <code>lua-resty-limit-traffic<\/code> with OpenResty) to store counters in Redis directly from Nginx.<\/li>\n<\/ul>\n<p>The pattern is the same: Nginx extracts a key (client ID, user, tenant), calls out to Redis to check\/update a counter, and based on the response either proxies to upstream or returns 429. Redis gives you atomic increments and expirations that work reliably across multiple API gateway instances.<\/p>\n<h2><span id=\"Cloudflare_Rate_Limiting_and_WAF_Rules_at_the_Edge\">Cloudflare Rate Limiting and WAF Rules at the Edge<\/span><\/h2>\n<p>If you already run your domains through Cloudflare, it\u2019s often the most cost\u2011effective place to start rate limiting. It reduces load on your servers, saves bandwidth, and can be tuned per path and per HTTP method.<\/p>\n<h3><span id=\"Layering_edge_rules_in_front_of_your_origin\">Layering edge rules in front of your origin<\/span><\/h3>\n<p>Cloudflare offers two main tools useful for APIs and microservices:<\/p>\n<ul>\n<li><strong>Rate limiting rules:<\/strong> Define thresholds such as \u201cif a single IP makes more than 100 requests to <code>\/api\/<\/code> in 10 seconds, then block or challenge for 1 minute\u201d.<\/li>\n<li><strong>WAF rules:<\/strong> Block obvious attack patterns, bad bots, or known vulnerability exploits before they reach your origin.<\/li>\n<\/ul>\n<p>For a step\u2011by\u2011step walk\u2011through, including small business use cases and WordPress protection scenarios, see our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/cloudflare-guvenlik-ayarlari-rehberi-kucuk-isletme-siteleri-icin-waf-rate-limit-ve-bot-korumasi\/\">Cloudflare security settings guide for WAF, rate limiting and bot protection<\/a>.<\/p>\n<h3><span id=\"Typical_Cloudflare_rules_for_APIs\">Typical Cloudflare rules for APIs<\/span><\/h3>\n<p>For API traffic, we commonly configure rules like:<\/p>\n<ul>\n<li><strong>Global per\u2011IP rate limit<\/strong> on <code>\/api\/<\/code> paths to absorb obvious floods.<\/li>\n<li><strong>Stricter limits<\/strong> on <code>\/api\/auth\/*<\/code> endpoints to slow down credential stuffing.<\/li>\n<li><strong>Soft limits (JS challenge)<\/strong> on expensive search or listing endpoints to throttle scrapers.<\/li>\n<\/ul>\n<p>Unlike Nginx\u2019s native limiter, Cloudflare can also combine several dimensions: country, ASN, user\u2011agent, bot score, and more. It is excellent at filtering out low\u2011quality or obviously automated traffic before it reaches more expensive layers.<\/p>\n<h3><span id=\"Working_with_429s_and_client_behavior\">Working with 429s and client behavior<\/span><\/h3>\n<p>Cloudflare lets you choose the action for each rule: block, challenge, or serve a custom response. For machine\u2011to\u2011machine APIs, we usually prefer a straightforward <code>429<\/code> at the edge that clients can understand. For browser\u2011based traffic, a CAPTCHA or JavaScript challenge may be more appropriate for suspected bots.<\/p>\n<p>If you pair Cloudflare with Nginx and Redis behind it, think of Cloudflare as the outer guard: it handles the worst spikes, while your origin\u2011side rate limiting is more application\u2011aware and focused on fairness between legitimate clients.<\/p>\n<h2><span id=\"Redis-Based_Distributed_Rate_Limiting_for_APIs_and_Microservices\">Redis-Based Distributed Rate Limiting for APIs and Microservices<\/span><\/h2>\n<p>For truly flexible and global rate limits, especially in a microservice architecture, Redis is the workhorse of choice. It combines low latency, atomic operations and rich data structures, which makes implementing various algorithms straightforward.<\/p>\n<h3><span id=\"Why_Redis_works_so_well_for_rate_limiting\">Why Redis works so well for rate limiting<\/span><\/h3>\n<ul>\n<li><strong>Atomic counters:<\/strong> The <code>INCR<\/code> and <code>INCRBY<\/code> commands let you update counters safely even under heavy concurrency.<\/li>\n<li><strong>Key expiration (TTL):<\/strong> With <code>EXPIRE<\/code> or <code>SETEX<\/code>, you can automatically reset counters after a window ends.<\/li>\n<li><strong>Lua scripting:<\/strong> Run small scripts server\u2011side for token bucket or sliding window logic in a single atomic step.<\/li>\n<li><strong>Data structures:<\/strong> Sorted sets, hashes and bitmaps open the door to complex per\u2011feature, per\u2011user tracking.<\/li>\n<\/ul>\n<p>For production use, you will want Redis to be highly available so that your rate limiting is not a single point of failure. Our guide <a href=\"https:\/\/www.dchost.com\/blog\/en\/wordpress-nesne-onbelleginde-redisi-ayaga-kaldirmanin-sirri-sentinel-aof-rdb-ve-failover-ne-zaman-devreye-girer\/\">Never Lose Your Cache Again: High\u2011Availability Redis for WordPress<\/a> describes how we use Sentinel, AOF\/RDB and failover, and the same principles apply to a rate limiting Redis cluster.<\/p>\n<h3><span id=\"Simple_fixed-window_limit_with_INCR_EXPIRE\">Simple fixed-window limit with INCR + EXPIRE<\/span><\/h3>\n<p>A straightforward per\u2011minute limit could look like this in pseudocode:<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">function checkRateLimit(clientId):\n    window = currentUnixTime() \/\/ 60          # current minute\n    key = &quot;rate:&quot; + clientId + &quot;:&quot; + window\n\n    count = redis.INCR(key)\n    if count == 1:\n        redis.EXPIRE(key, 60)  # expire after one minute\n\n    if count &gt; 1000:\n        return { allowed: false, retryAfter: 60 }\n\n    return { allowed: true }\n<\/code><\/pre>\n<p>Here:<\/p>\n<ul>\n<li>Each client has a fresh key every minute (for example, <code>rate:client123:29123456<\/code>).<\/li>\n<li>The first request sets the TTL; subsequent ones simply increment.<\/li>\n<li>Once count exceeds 1000, your service returns 429 to the caller.<\/li>\n<\/ul>\n<h3><span id=\"Token_bucket_with_Lua_scripting\">Token bucket with Lua scripting<\/span><\/h3>\n<p>To allow brief bursts while maintaining a long\u2011term limit, a token bucket usually feels more natural. A minimal Redis Lua script might:<\/p>\n<ol>\n<li>Read the last refill timestamp and token count.<\/li>\n<li>Calculate how many tokens to add based on elapsed time and refill rate.<\/li>\n<li>Deduct one token for the current request if available.<\/li>\n<li>Return whether the request is allowed plus the current token count.<\/li>\n<\/ol>\n<p>Example (simplified Lua):<\/p>\n<pre class=\"language-bash line-numbers\"><code class=\"language-bash\">-- KEYS[1] = bucket key\n-- ARGV[1] = capacity\n-- ARGV[2] = refill_rate (tokens per second)\n-- ARGV[3] = now (current timestamp in seconds)\n\nlocal capacity   = tonumber(ARGV[1])\nlocal refillRate = tonumber(ARGV[2])\nlocal now        = tonumber(ARGV[3])\n\nlocal data = redis.call(&quot;HMGET&quot;, KEYS[1], &quot;tokens&quot;, &quot;timestamp&quot;)\nlocal tokens    = tonumber(data[1]) or capacity\nlocal timestamp = tonumber(data[2]) or now\n\n-- Refill tokens based on time elapsed\nlocal delta = math.max(0, now - timestamp)\nlocal refill = delta * refillRate\n\ntokens = math.min(capacity, tokens + refill)\n\nlocal allowed = 0\nif tokens &gt;= 1 then\n  tokens = tokens - 1\n  allowed = 1\nend\n\nredis.call(&quot;HMSET&quot;, KEYS[1], &quot;tokens&quot;, tokens, &quot;timestamp&quot;, now)\nredis.call(&quot;EXPIRE&quot;, KEYS[1], 3600)\n\nreturn { allowed, tokens }\n<\/code><\/pre>\n<p>Your application calls this script via <code>EVALSHA<\/code> for each request. A response of <code>allowed = 1<\/code> means proceed; <code>0<\/code> means return a 429 with an appropriate <code>Retry-After<\/code>.<\/p>\n<h3><span id=\"Concurrency_limits_to_protect_slow_dependencies\">Concurrency limits to protect slow dependencies<\/span><\/h3>\n<p>Sometimes the problem isn\u2019t the total number of requests per minute, but how many slow operations are happening <em>at the same moment<\/em>. A classic example is generating a large report or calling a slow external payment API.<\/p>\n<p>A simple pattern:<\/p>\n<ul>\n<li>On request start, try to <code>INCR<\/code> a key like <code>concurrent:client123:report<\/code>.<\/li>\n<li>If the result is more than the allowed maximum (for example, 3), immediately <code>DECR<\/code> again and return 429.<\/li>\n<li>On request completion (success or failure), <code>DECR<\/code> the key.<\/li>\n<\/ul>\n<p>To protect against stuck counts (for example, if a process crashes), you can combine <code>INCR<\/code> with a TTL (or use a Lua script that sets a TTL when the counter becomes non\u2011zero) so that orphaned counters eventually disappear.<\/p>\n<h2><span id=\"Observability_Testing_and_Tuning_Your_Limits\">Observability, Testing and Tuning Your Limits<\/span><\/h2>\n<p>Rate limiting that you cannot see or measure will eventually hurt you. Observability is just as important as the rules themselves.<\/p>\n<h3><span id=\"Logging_and_metrics_for_rate_limits\">Logging and metrics for rate limits<\/span><\/h3>\n<p>At minimum, you should:<\/p>\n<ul>\n<li>Log every 429 response from Nginx and your application, including the key that triggered it (IP, user, client ID).<\/li>\n<li>Expose metrics such as \u201crequests allowed vs limited\u201d per endpoint and per plan.<\/li>\n<li>Alert when the percentage of 429s spikes unexpectedly; it might indicate an attack or a misconfigured limit.<\/li>\n<\/ul>\n<p>For centralizing logs across multiple VPS instances and keeping them queryable, we like stacks such as Loki + Promtail + Grafana. Our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/vps-log-yonetimi-nasil-rayina-oturur-grafana-loki-promtail-ile-merkezi-loglama-tutma-sureleri-ve-alarm-kurallari\/\">VPS log management without the drama: centralized logging with Loki and Promtail<\/a> shows how we set this up in hosting environments very similar to API clusters.<\/p>\n<h3><span id=\"Dry_runs_and_soft_enforcement\">Dry runs and soft enforcement<\/span><\/h3>\n<p>When rolling out new limits, a useful approach is:<\/p>\n<ol>\n<li>Implement the counters and logging, but still allow all requests (soft mode).<\/li>\n<li>Observe who would have been rate\u2011limited for a few days: which endpoints, which keys, at what times.<\/li>\n<li>Adjust thresholds and key selection based on real observed traffic.<\/li>\n<li>Switch to hard enforcement with 429s once you are confident.<\/li>\n<\/ol>\n<p>This is especially important for B2B APIs where you may have clients doing legitimate high\u2011volume batch imports or overnight sync jobs.<\/p>\n<h3><span id=\"Choosing_sensible_numbers\">Choosing sensible numbers<\/span><\/h3>\n<p>Capacity planning is a whole topic on its own, but a quick rule of thumb:<\/p>\n<ul>\n<li>Estimate your backend\u2019s safe sustained throughput (requests per second) without breaching CPU, database or external API limits.<\/li>\n<li>Reserve a margin (for example, 30\u201340%) for unexpected spikes.<\/li>\n<li>Distribute the remaining capacity across tenants according to their plans.<\/li>\n<\/ul>\n<p>Our guide on <a href=\"https:\/\/www.dchost.com\/blog\/en\/shared-hosting-ve-vps-icin-trafik-ve-bant-genisligi-ihtiyaci-nasil-hesaplanir\/\">how to estimate traffic and bandwidth needs on shared hosting and VPS<\/a> walks through similar calculations from a hosting perspective; the same thinking applies when you translate those numbers into per\u2011client and per\u2011endpoint API limits.<\/p>\n<h2><span id=\"Network-Level_Rate_Limiting_with_Firewalls\">Network-Level Rate Limiting with Firewalls<\/span><\/h2>\n<p>While Nginx, Cloudflare and Redis handle application\u2011layer limits, it is sometimes useful to add a coarse network\u2011layer limit for obvious abuse (for example, blocking an IP that is opening thousands of TCP connections per second).<\/p>\n<p>On Linux VPS and dedicated servers, modern setups increasingly use <strong>nftables<\/strong> for this purpose. You can define rules that:<\/p>\n<ul>\n<li>Limit new connections per IP per second to your API ports.<\/li>\n<li>Temporarily drop or rate limit offenders at the packet level.<\/li>\n<li>Combine with port knocking or connection tracking for sensitive internal services.<\/li>\n<\/ul>\n<p>If you want practical examples, including rate limiting and IPv6\u2011aware rules, our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/nftables-ile-vps-guvenlik-duvari-rehberi-rate-limit-port-knocking-ve-ipv6-kurallari-nasil-tatli-tatli-kurulur\/\">The nftables firewall cookbook for VPS: rate limiting, port knocking and IPv6 rules<\/a> goes step\u2011by\u2011step through real configurations.<\/p>\n<p>Network\u2011level rate limiting will never be as precise as Redis\u2011backed application\u2011layer limits, but it is an excellent coarse filter and backstop if higher layers misbehave or are overwhelmed.<\/p>\n<h2><span id=\"Putting_It_All_Together_on_dchostcom_Infrastructure\">Putting It All Together on dchost.com Infrastructure<\/span><\/h2>\n<p>Let\u2019s put all of this into a realistic architecture that we frequently see from our customers running APIs and microservices on VPS and dedicated servers at dchost.com.<\/p>\n<h3><span id=\"A_reference_architecture_for_API_rate_limiting\">A reference architecture for API rate limiting<\/span><\/h3>\n<ol>\n<li><strong>DNS and edge:<\/strong> Your API domain points to Cloudflare. At the edge, you configure WAF rules and a few broad rate limiting rules (per\u2011IP caps on <code>\/api\/<\/code> and tighter caps on <code>\/api\/auth\/*<\/code>).<\/li>\n<li><strong>Gateway server:<\/strong> On a high\u2011performance VPS with NVMe SSD (we discuss why NVMe matters in our article <a href=\"https:\/\/www.dchost.com\/blog\/en\/nvme-vps-hosting-rehberi-hizin-nereden-geldigini-nasil-olculdugunu-ve-gercek-sonuclari-beraber-gorelim\/\">NVMe VPS hosting guide<\/a>), you run Nginx as the API gateway. Here you enforce per\u2011client limits with <code>limit_req<\/code> and return 429 with <code>Retry-After<\/code> when clients exceed their plan.<\/li>\n<li><strong>Redis tier:<\/strong> On the same server (for smaller projects) or on a separate VPS\/dedicated node (for larger ones), you run a HA Redis deployment. This stores global per\u2011tenant quotas, token buckets for critical operations, and concurrency counters for slow jobs.<\/li>\n<li><strong>Backend microservices:<\/strong> Each service trusts the gateway for coarse limits but also performs its own fine\u2011grained checks via Redis for business\u2011specific constraints (for example, a maximum of 10 weekly exports per team).<\/li>\n<li><strong>Observability:<\/strong> Logs and metrics are shipped from all nodes into a centralized stack (for example, Loki + Grafana) so you can see 429 rates, top offenders and path\u2011level statistics.<\/li>\n<\/ol>\n<h3><span id=\"Scaling_up_VPS_dedicated_and_colocation_options\">Scaling up: VPS, dedicated and colocation options<\/span><\/h3>\n<p>Because we run our own data centers and infrastructure at dchost.com, this layered approach to rate limiting fits naturally into our hosting portfolio:<\/p>\n<ul>\n<li><strong>Small and medium APIs:<\/strong> One or two well\u2011sized NVMe VPS servers are usually enough to host Nginx, Redis and microservices together, with Cloudflare at the edge.<\/li>\n<li><strong>High\u2011traffic APIs and SaaS platforms:<\/strong> Separate gateway, Redis and backend tiers across multiple VPS or dedicated servers, often with dedicated database nodes and object storage in the mix.<\/li>\n<li><strong>Very large or regulated workloads:<\/strong> Customers bring their own hardware via colocation and still apply the same Nginx + Cloudflare + Redis patterns, just on physically dedicated racks and network segments.<\/li>\n<\/ul>\n<h3><span id=\"Next_steps\">Next steps<\/span><\/h3>\n<p>If you\u2019re planning a new API or microservice platform\u2014or if you\u2019re already hitting limits and noisy neighbors on your current environment\u2014taking a day to design proper rate limiting often saves weeks of firefighting later. Start with a simple edge rule at Cloudflare, add Nginx limits for the riskiest endpoints, and then introduce Redis\u2011backed quotas where you need cross\u2011node and per\u2011tenant control.<\/p>\n<p>At dchost.com we implement these patterns daily on our VPS, dedicated and colocation platforms. If you\u2019re not sure how to size the servers or where to draw the boundaries between gateway, Redis and your services, our team can help you choose a realistic architecture and grow it over time without surprises.<\/p>\n<\/div>","protected":false},"excerpt":{"rendered":"<p>When you start exposing APIs or microservices to the outside world, one of the first real-world problems you hit is traffic that doesn\u2019t behave nicely. Sometimes it is honest heavy usage from one customer, sometimes an integration bug that loops requests, and sometimes outright malicious scanning or credential stuffing. In all of these cases, you [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":3458,"comment_status":"","ping_status":"","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[26],"tags":[],"class_list":["post-3457","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-teknoloji"],"_links":{"self":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/3457","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/comments?post=3457"}],"version-history":[{"count":0,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/posts\/3457\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media\/3458"}],"wp:attachment":[{"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/media?parent=3457"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/categories?post=3457"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.dchost.com\/blog\/en\/wp-json\/wp\/v2\/tags?post=3457"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}