📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
API Gateway Monitoring: Complete Guide 2026
Your API gateway is the front door to your backend services. Every request goes through it. When it degrades, every service degrades. This guide covers the metrics that matter, how to monitor the major gateways (Kong, AWS, Nginx), and how to build alerting that catches issues before your users do.
What is an API Gateway?
An API gateway is a reverse proxy that sits between clients and your backend services. It handles request routing, authentication, rate limiting, SSL termination, caching, and observability. Common gateways: AWS API Gateway, Kong, Nginx, Traefik,Envoy, Azure API Management, and Apigee.
The 8 Essential API Gateway Metrics
Request Rate
Must trackTotal requests per second (RPS) passing through the gateway
Alert threshold: Alert on sudden drops (>50% decrease) or unexpected spikes (>3x baseline)
Why it matters: Sudden drops indicate upstream failures or routing issues; spikes indicate traffic surge or DDoS
Error Rate (5xx)
Must trackPercentage of requests returning 5xx server errors
Alert threshold: Alert when 5xx rate exceeds 1% over 5 minutes
Why it matters: Gateway-level 5xx means upstream services are failing or gateway itself is misconfigured
Error Rate (4xx)
Must trackPercentage of requests returning 4xx client errors
Alert threshold: Alert when 4xx rate exceeds 10% (may indicate auth issues or client bugs)
Why it matters: High 401/403 rates suggest auth service problems; high 429 suggests rate limiting is too aggressive
Latency (P95/P99)
Must trackResponse time at the 95th and 99th percentile
Alert threshold: Alert when P95 exceeds your SLA target (e.g., >500ms for most APIs)
Why it matters: P99 catches tail latency that averages hide — this is what your worst-case users experience
Upstream Latency
Must trackTime spent waiting for the upstream backend to respond
Alert threshold: Alert when upstream latency exceeds gateway latency by >200% consistently
Why it matters: Isolates whether slowness is in the gateway layer or the backend service
Active Connections
Must trackNumber of concurrent connections being handled
Alert threshold: Alert at 80% of configured max connections
Why it matters: Connection pool exhaustion causes new requests to fail immediately with connection refused
Cache Hit Rate
Must trackPercentage of responses served from gateway cache
Alert threshold: Alert if cache hit rate drops >20% from baseline
Why it matters: Sudden cache miss spikes increase load on backend services and increase latency
Rate Limit Violations
Must trackCount of requests rejected due to rate limiting (429 responses)
Alert threshold: Alert on unexpected spikes in rate limiting
Why it matters: Excessive rate limiting may indicate misconfigured limits or a client with a bug making too many calls
External monitoring to complement your gateway metrics
Internal gateway metrics tell you what's happening inside. Better Stack adds external synthetic checks — verifying your API works end-to-end from the client's perspective.
Try Better Stack Free →Monitoring by Gateway Type
AWS API Gateway
AWS API Gateway automatically sends metrics to CloudWatch. Enable detailed CloudWatch metricsin your API settings for per-resource and per-method granularity (default is aggregate only).
Key CloudWatch metrics:
Count, Latency, IntegrationLatency
4XXError, 5XXError
CacheHitCount, CacheMissCount
Kong Gateway
Kong exposes Prometheus metrics via the Prometheus plugin. Install it globally to get metrics for all services and routes.
# Enable Prometheus plugin globally
curl -X POST http://localhost:8001/plugins \
--data "name=prometheus"
# Scrape endpoint
curl http://localhost:8001/metrics
Nginx / Nginx Plus
Open-source Nginx provides basic metrics via stub_status. Nginx Plus adds a full JSON metrics API at /api/ with per-upstream metrics.
# nginx.conf — enable stub_status
location /nginx_status {
stub_status;
allow 127.0.0.1;
deny all;
}
Alerting Strategy: What to Alert On
Most teams either under-alert (miss real incidents) or over-alert (alert fatigue kills response times). Use this tiered approach:
P0 — Page immediately
- •5xx error rate > 5% for 2+ consecutive minutes
- •Gateway uptime check failing (gateway unreachable)
- •P99 latency > 5s for 5+ minutes
- •Active connections > 95% of configured max
P1 — Alert during business hours
- •5xx error rate > 1% sustained over 10 minutes
- •P95 latency > your SLA threshold (e.g., 500ms)
- •Authentication failure rate > 20%
- •Rate limit violations > 2x baseline
P2 — Daily digest / dashboard review
- •Cache hit rate trending down week-over-week
- •4xx rate gradually increasing (could indicate client-side bug)
- •Traffic volume anomalies (unexpected drops in off-hours)
- •Upstream latency creeping up (early warning of backend degradation)
All your gateway and API monitoring in one place
Better Stack combines uptime monitoring, log management, and incident alerting. Monitor your API gateway endpoints, correlate with logs, and get on-call rotations without stitching together 5 tools.
Try Better Stack Free →API Gateway Monitoring Tools Comparison
| Tool | Best For | Gateway Support | Starting Price |
|---|---|---|---|
| Datadog | Enterprise full-stack observability | AWS, Kong, Nginx, Envoy, HAProxy | $15/host/mo |
| Better Stack | Uptime + log monitoring combo | Any (external health checks) | $24/mo |
| Grafana + Prometheus | Open-source, self-hosted | Kong, Nginx, Envoy, Traefik | Free (infra costs) |
| New Relic | APM + infrastructure monitoring | AWS API GW, Nginx, Kong | Free tier available |
| AWS CloudWatch | AWS API Gateway native monitoring | AWS API Gateway only | Pay per metric/alarm |
| Dynatrace | AI-powered anomaly detection | Nginx, Kong, AWS, Traefik | $69/host/mo |
Frequently Asked Questions
What metrics should I monitor for an API gateway?
The essentials: request rate, 5xx error rate, 4xx error rate, P95/P99 latency, upstream latency, active connections, and cache hit rate. Start with error rate and P99 latency — these are the two metrics that directly impact user experience.
How do I monitor AWS API Gateway?
Enable detailed CloudWatch metrics in API Gateway settings. Set up alarms on 5XXError (alert >1%) and Latency P99 (alert when exceeding your SLA). Enable X-Ray tracing to see full request traces through to Lambda/backends. Use API Gateway access logs for per-request detail.
What is the difference between API gateway monitoring and API monitoring?
Gateway monitoring covers infrastructure concerns: request routing, rate limiting, auth, and traffic management. API monitoring covers business logic: does the API return correct data, are responses valid, is the service doing what it should? Both are essential — use synthetic API tests (Better Stack, Checkly) alongside gateway metrics.
How do I reduce alert noise in API gateway monitoring?
Use anomaly detection instead of static thresholds for traffic-based alerts. Require 2+ consecutive breach minutes before alerting to filter transient spikes. Tier your alerts (P0/P1/P2) so only critical thresholds page on-call. Group related alerts to prevent alert storms during cascade failures.
Should I monitor my API gateway externally as well as internally?
Yes, always. Internal metrics tell you what's happening inside the gateway. External synthetic checks (from a monitoring service outside your network) tell you what your users actually experience. A gateway can report healthy internally while users see timeouts due to network or DNS issues upstream.
Related Monitoring Guides
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time API Gateway goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for API Gateway + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”