Nginx Monitoring Guide: Key Metrics, Alerts & Tools (2026)
Nginx powers over 34% of the world's websites. When it breaks, everything downstream breaks too. This guide covers the metrics that matter, how to expose them with stub_status and Prometheus, and how to set alerts before users notice problems.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
TL;DR — Nginx Monitoring Checklist
- ✅ Enable
stub_statusmodule to expose basic connection metrics - ✅ Deploy nginx-prometheus-exporter to scrape metrics into Prometheus
- ✅ Track: active connections, requests/sec, 5xx rate, upstream latency
- ✅ Alert when 5xx error rate > 1% for 5+ minutes
- ✅ Alert when active connections > 80% of worker capacity
- ✅ Use access logs for p95/p99 latency via log parsing (or Nginx Plus API)
Why Nginx Monitoring Is Different
Nginx is a reverse proxy, load balancer, and web server all at once. Unlike application-level monitoring (APM), Nginx monitoring sits at the infrastructure layer — it sees every request before your app code runs. This means:
- → A 504 Gateway Timeout in Nginx doesn't mean Nginx is broken — your upstream (app server, backend) is slow or dead
- → A 502 Bad Gateway means Nginx got a bad response from upstream, not from the client
- → High active connections with normal RPS means requests are taking longer — check upstream latency
- → Worker processes near capacity (approaching
worker_connectionslimit) will cause connection refused errors
Good Nginx monitoring separates Nginx's own health (worker processes, connection limits) from upstream health (your app servers). Both matter.
Core Nginx Metrics
Connection Metrics
| Metric | Source | What It Means | Alert Threshold |
|---|---|---|---|
| Active connections | stub_status | Workers currently handling requests + keep-alive + waiting | > 80% of worker_connections × workers |
| Accepted connections | stub_status | Total connections accepted since start (monotonic) | Rate drop > 50% vs baseline |
| Handled connections | stub_status | Should equal accepted; gap means connection drops | accepted − handled > 0 |
| Waiting connections | stub_status | Keep-alive connections idle — high count is normal with long keepalive_timeout | Context-dependent |
HTTP Error Metrics
These come from access log parsing or Nginx Plus. They're the most actionable indicators of user-visible problems.
| Status Code | Meaning | Root Cause |
|---|---|---|
| 502 | Bad Gateway | Upstream returned an invalid response (crash, restart, invalid HTTP) |
| 503 | Service Unavailable | No upstream servers healthy, Nginx over worker limit, rate limiting |
| 504 | Gateway Timeout | Upstream too slow, proxy_read_timeout too short |
| 499 | Client Closed Request | Client gave up before response — usually slow upstream |
Monitor your Nginx endpoints with Better Stack
Better Stack runs synthetic checks from 30+ global locations. HTTP, TCP, and keyword monitors — with on-call alerting when Nginx returns errors.
Try Better Stack Free →Step 1 — Enable stub_status
The ngx_http_stub_status_module exposes basic connection metrics at an HTTP endpoint. It's included in most Nginx builds (verify with nginx -V | grep stub_status).
# /etc/nginx/conf.d/status.conf
server {
listen 127.0.0.1:8080;
server_name _;
location /nginx_status {
stub_status on;
access_log off;
allow 127.0.0.1;
deny all;
}
}
# Test and reload
nginx -t && nginx -s reload
# Output at http://localhost:8080/nginx_status:
# Active connections: 42
# server accepts handled requests
# 15234 15234 78901
# Reading: 1 Writing: 8 Waiting: 33Security: Always restrict stub_status to 127.0.0.1 or your monitoring subnet. Never expose it on a public IP — it leaks connection metadata.
Reading the output: accepts = total connections since start; handled = connections that were actually processed (should match accepts); the difference signals dropped connections (usually a worker_connections limit issue).
Step 2 — Prometheus Exporter Setup
The official nginx-prometheus-exporter (by F5/Nginx Inc.) converts stub_status output to Prometheus metrics and exposes them on port 9113.
# Docker Compose example
version: '3'
services:
nginx-exporter:
image: nginx/nginx-prometheus-exporter:1.1
command:
- --nginx.scrape-uri=http://nginx:8080/nginx_status
ports:
- "9113:9113"
depends_on:
- nginx
# Prometheus scrape config
scrape_configs:
- job_name: 'nginx'
static_configs:
- targets: ['localhost:9113']
scrape_interval: 15s
# Key metrics exposed:
# nginx_connections_active
# nginx_connections_accepted_total
# nginx_connections_handled_total
# nginx_http_requests_total
# nginx_up (1 = healthy, 0 = can't reach nginx)Essential Prometheus Alert Rules
groups:
- name: nginx
rules:
# Nginx is down
- alert: NginxDown
expr: nginx_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Nginx is not responding"
# Connection capacity warning (>80% of worker limit)
- alert: NginxHighConnectionCount
expr: nginx_connections_active > (worker_connections * worker_processes * 0.8)
for: 5m
labels:
severity: warning
annotations:
summary: "Nginx connections approaching limit"
# 5xx spike (requires log-based metric or Nginx Plus)
- alert: NginxHigh5xxRate
expr: |
rate(nginx_http_requests_total{status=~"5.."}[5m]) /
rate(nginx_http_requests_total[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Nginx 5xx error rate > 1%"Step 3 — Access Log Parsing for Latency
stub_status doesn't expose per-request latency. The best open-source approach is parsing access logs with the $request_time variable using a log shipper like Promtail or Vector.
# nginx.conf — add request_time to log format
log_format monitoring '$remote_addr - $remote_user [$time_local] '
'"$request" $status $body_bytes_sent '
'"$http_referer" "$http_user_agent" '
'rt=$request_time uct=$upstream_connect_time '
'uht=$upstream_header_time urt=$upstream_response_time';
access_log /var/log/nginx/access.log monitoring;
# Key variables:
# $request_time — total time from first byte to last byte sent to client
# $upstream_response_time — time your backend took (excludes Nginx overhead)
# $upstream_connect_time — TCP connect time to upstreamWith upstream_response_time in logs, a log aggregation tool (Loki, Better Stack Logs, Elastic) can compute p95 latency per endpoint. Alert when p95 exceeds your SLO threshold (e.g., 2 seconds for web pages, 500ms for APIs).
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Nginx-fronted services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Upstream Health Monitoring
When Nginx acts as a reverse proxy, your upstream servers are your real targets. Use passive health checks (built-in) or active health checks (Nginx Plus or the open-source nginx_upstream_check_module).
# Passive health check (open source)
# Marks upstream unhealthy after N consecutive failures
upstream app_servers {
server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
server 10.0.0.3:8000 backup;
}
# Active health check (Nginx Plus)
upstream app_servers {
zone app_zone 64k;
server 10.0.0.1:8000;
server 10.0.0.2:8000;
}
server {
location / {
proxy_pass http://app_servers;
health_check interval=5s fails=3 passes=2 uri=/health;
}
}For open-source Nginx, external uptime monitoring (like API Status Check or Better Stack) is the most reliable way to detect when specific upstream endpoints are failing. These tools probe your endpoints from outside and alert immediately when they return 5xx or go unresponsive.
Nginx Monitoring Tool Comparison
| Tool | Best For | Nginx Support | Pricing |
|---|---|---|---|
| Better Stack | Uptime + log monitoring combined | Uptime checks + log ingestion | Free tier + $20/mo |
| Prometheus + Grafana | Full observability stack | nginx-prometheus-exporter | Open source (self-hosted) |
| Datadog | Enterprise full-stack APM | Native Nginx integration | $15+/host/month |
| New Relic | APM + infrastructure in one | Nginx integration via agent | Free 100GB/mo + usage billing |
| Grafana Cloud | Managed Prometheus + Loki | Exporter + log parsing | Free tier + usage-based |
Common Nginx Performance Problems & Fixes
Too many open files error
Nginx hits the OS file descriptor limit under high load.
Fix: Set worker_rlimit_nofile 65535; in nginx.conf and raise the system limit with ulimit -n 65535 in the service unit file.
Worker connections exhausted
All worker slots full — new connections are rejected with 503.
Fix: Increase worker_connections (up to worker_rlimit_nofile) or scale horizontally. Set worker_processes auto; to use all CPU cores.
upstream timed out (110: Connection timed out)
Backend is slow. Default proxy_read_timeout is 60 seconds.
Fix: First diagnose why the backend is slow (slow query, memory pressure, CPU spike). Increasing proxy_read_timeout masks the problem — fix the root cause.
High memory usage
Each worker process holds memory; large proxy buffers multiply this.
Fix: Tune proxy_buffer_size and proxy_buffers down to minimum needed. Monitor worker RSS per process, not just total Nginx memory.
FAQ
What are the most important Nginx metrics to monitor?
The six most critical are: active connections, requests per second, HTTP 5xx error rate, request latency (p95), upstream response time, and worker process health. Track them together — a spike in 5xx with no change in RPS usually points to a backend problem, not Nginx itself.
How do I enable the Nginx status page?
Add a location block with "stub_status on;" inside a server block restricted to 127.0.0.1. After nginx -t && nginx -s reload, curl http://localhost/nginx_status returns connection counts. Never expose this endpoint publicly.
How do I set up the Nginx Prometheus exporter?
Run nginx/nginx-prometheus-exporter with --nginx.scrape-uri pointing at your stub_status URL. It exposes metrics on port 9113. Add a Prometheus scrape job targeting localhost:9113, then build Grafana dashboards from nginx_connections_active, nginx_http_requests_total, and nginx_up metrics.
What Nginx alert thresholds should I set?
Starting thresholds: 5xx rate > 1% for 5 minutes (critical), active connections > 80% of capacity (warning), Nginx process down (critical), upstream p95 latency > 2 seconds (warning). Tune to your traffic baseline — high-traffic sites may tolerate higher error rates during flash sales.
What is the difference between Nginx open source and Nginx Plus monitoring?
Open source provides basic connection counts via stub_status. Nginx Plus adds a /api endpoint with per-upstream metrics, health check results, per-zone traffic stats, and a live dashboard. For most teams, the Prometheus exporter + log parsing fills the open-source monitoring gap at no cost.
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”
Related Guides
Infrastructure Monitoring Guide
Complete server and infrastructure monitoring coverage.
Network Monitoring Guide
Monitor DNS, latency, packet loss, and connectivity.
Database Monitoring Guide
PostgreSQL, MySQL, Redis, and MongoDB monitoring.
Best APM Tools 2026
Compare the top application performance monitoring tools.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Nginx-fronted services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial