Web Server MonitoringNginx2026 Guide

Nginx Monitoring Guide: Key Metrics, Alerts & Tools (2026)

Nginx powers over 34% of the world's websites. When it breaks, everything downstream breaks too. This guide covers the metrics that matter, how to expose them with stub_status and Prometheus, and how to set alerts before users notice problems.

Updated April 202612 min readSRE / DevOps
Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Nginx Monitoring Checklist

  • ✅ Enable stub_status module to expose basic connection metrics
  • ✅ Deploy nginx-prometheus-exporter to scrape metrics into Prometheus
  • ✅ Track: active connections, requests/sec, 5xx rate, upstream latency
  • ✅ Alert when 5xx error rate > 1% for 5+ minutes
  • ✅ Alert when active connections > 80% of worker capacity
  • ✅ Use access logs for p95/p99 latency via log parsing (or Nginx Plus API)

Why Nginx Monitoring Is Different

Nginx is a reverse proxy, load balancer, and web server all at once. Unlike application-level monitoring (APM), Nginx monitoring sits at the infrastructure layer — it sees every request before your app code runs. This means:

  • A 504 Gateway Timeout in Nginx doesn't mean Nginx is broken — your upstream (app server, backend) is slow or dead
  • A 502 Bad Gateway means Nginx got a bad response from upstream, not from the client
  • High active connections with normal RPS means requests are taking longer — check upstream latency
  • Worker processes near capacity (approaching worker_connections limit) will cause connection refused errors

Good Nginx monitoring separates Nginx's own health (worker processes, connection limits) from upstream health (your app servers). Both matter.

Core Nginx Metrics

Connection Metrics

MetricSourceWhat It MeansAlert Threshold
Active connectionsstub_statusWorkers currently handling requests + keep-alive + waiting> 80% of worker_connections × workers
Accepted connectionsstub_statusTotal connections accepted since start (monotonic)Rate drop > 50% vs baseline
Handled connectionsstub_statusShould equal accepted; gap means connection dropsaccepted − handled > 0
Waiting connectionsstub_statusKeep-alive connections idle — high count is normal with long keepalive_timeoutContext-dependent

HTTP Error Metrics

These come from access log parsing or Nginx Plus. They're the most actionable indicators of user-visible problems.

Status CodeMeaningRoot Cause
502Bad GatewayUpstream returned an invalid response (crash, restart, invalid HTTP)
503Service UnavailableNo upstream servers healthy, Nginx over worker limit, rate limiting
504Gateway TimeoutUpstream too slow, proxy_read_timeout too short
499Client Closed RequestClient gave up before response — usually slow upstream
📡
Recommended

Monitor your Nginx endpoints with Better Stack

Better Stack runs synthetic checks from 30+ global locations. HTTP, TCP, and keyword monitors — with on-call alerting when Nginx returns errors.

Try Better Stack Free →

Step 1 — Enable stub_status

The ngx_http_stub_status_module exposes basic connection metrics at an HTTP endpoint. It's included in most Nginx builds (verify with nginx -V | grep stub_status).

# /etc/nginx/conf.d/status.conf
server {
    listen 127.0.0.1:8080;
    server_name _;

    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

# Test and reload
nginx -t && nginx -s reload

# Output at http://localhost:8080/nginx_status:
# Active connections: 42
# server accepts handled requests
#  15234 15234 78901
# Reading: 1 Writing: 8 Waiting: 33

Security: Always restrict stub_status to 127.0.0.1 or your monitoring subnet. Never expose it on a public IP — it leaks connection metadata.

Reading the output: accepts = total connections since start; handled = connections that were actually processed (should match accepts); the difference signals dropped connections (usually a worker_connections limit issue).

Step 2 — Prometheus Exporter Setup

The official nginx-prometheus-exporter (by F5/Nginx Inc.) converts stub_status output to Prometheus metrics and exposes them on port 9113.

# Docker Compose example
version: '3'
services:
  nginx-exporter:
    image: nginx/nginx-prometheus-exporter:1.1
    command:
      - --nginx.scrape-uri=http://nginx:8080/nginx_status
    ports:
      - "9113:9113"
    depends_on:
      - nginx

# Prometheus scrape config
scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
    scrape_interval: 15s

# Key metrics exposed:
# nginx_connections_active
# nginx_connections_accepted_total
# nginx_connections_handled_total
# nginx_http_requests_total
# nginx_up (1 = healthy, 0 = can't reach nginx)

Essential Prometheus Alert Rules

groups:
  - name: nginx
    rules:
      # Nginx is down
      - alert: NginxDown
        expr: nginx_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nginx is not responding"

      # Connection capacity warning (>80% of worker limit)
      - alert: NginxHighConnectionCount
        expr: nginx_connections_active > (worker_connections * worker_processes * 0.8)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Nginx connections approaching limit"

      # 5xx spike (requires log-based metric or Nginx Plus)
      - alert: NginxHigh5xxRate
        expr: |
          rate(nginx_http_requests_total{status=~"5.."}[5m]) /
          rate(nginx_http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Nginx 5xx error rate > 1%"

Step 3 — Access Log Parsing for Latency

stub_status doesn't expose per-request latency. The best open-source approach is parsing access logs with the $request_time variable using a log shipper like Promtail or Vector.

# nginx.conf — add request_time to log format
log_format monitoring '$remote_addr - $remote_user [$time_local] '
    '"$request" $status $body_bytes_sent '
    '"$http_referer" "$http_user_agent" '
    'rt=$request_time uct=$upstream_connect_time '
    'uht=$upstream_header_time urt=$upstream_response_time';

access_log /var/log/nginx/access.log monitoring;

# Key variables:
# $request_time       — total time from first byte to last byte sent to client
# $upstream_response_time — time your backend took (excludes Nginx overhead)
# $upstream_connect_time  — TCP connect time to upstream

With upstream_response_time in logs, a log aggregation tool (Loki, Better Stack Logs, Elastic) can compute p95 latency per endpoint. Alert when p95 exceeds your SLO threshold (e.g., 2 seconds for web pages, 500ms for APIs).

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your Nginx-fronted services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Upstream Health Monitoring

When Nginx acts as a reverse proxy, your upstream servers are your real targets. Use passive health checks (built-in) or active health checks (Nginx Plus or the open-source nginx_upstream_check_module).

# Passive health check (open source)
# Marks upstream unhealthy after N consecutive failures
upstream app_servers {
    server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:8000 backup;
}

# Active health check (Nginx Plus)
upstream app_servers {
    zone app_zone 64k;
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
}

server {
    location / {
        proxy_pass http://app_servers;
        health_check interval=5s fails=3 passes=2 uri=/health;
    }
}

For open-source Nginx, external uptime monitoring (like API Status Check or Better Stack) is the most reliable way to detect when specific upstream endpoints are failing. These tools probe your endpoints from outside and alert immediately when they return 5xx or go unresponsive.

Nginx Monitoring Tool Comparison

ToolBest ForNginx SupportPricing
Better StackUptime + log monitoring combinedUptime checks + log ingestionFree tier + $20/mo
Prometheus + GrafanaFull observability stacknginx-prometheus-exporterOpen source (self-hosted)
DatadogEnterprise full-stack APMNative Nginx integration$15+/host/month
New RelicAPM + infrastructure in oneNginx integration via agentFree 100GB/mo + usage billing
Grafana CloudManaged Prometheus + LokiExporter + log parsingFree tier + usage-based

Common Nginx Performance Problems & Fixes

Too many open files error

Nginx hits the OS file descriptor limit under high load.

Fix: Set worker_rlimit_nofile 65535; in nginx.conf and raise the system limit with ulimit -n 65535 in the service unit file.

Worker connections exhausted

All worker slots full — new connections are rejected with 503.

Fix: Increase worker_connections (up to worker_rlimit_nofile) or scale horizontally. Set worker_processes auto; to use all CPU cores.

upstream timed out (110: Connection timed out)

Backend is slow. Default proxy_read_timeout is 60 seconds.

Fix: First diagnose why the backend is slow (slow query, memory pressure, CPU spike). Increasing proxy_read_timeout masks the problem — fix the root cause.

High memory usage

Each worker process holds memory; large proxy buffers multiply this.

Fix: Tune proxy_buffer_size and proxy_buffers down to minimum needed. Monitor worker RSS per process, not just total Nginx memory.

FAQ

What are the most important Nginx metrics to monitor?

The six most critical are: active connections, requests per second, HTTP 5xx error rate, request latency (p95), upstream response time, and worker process health. Track them together — a spike in 5xx with no change in RPS usually points to a backend problem, not Nginx itself.

How do I enable the Nginx status page?

Add a location block with "stub_status on;" inside a server block restricted to 127.0.0.1. After nginx -t && nginx -s reload, curl http://localhost/nginx_status returns connection counts. Never expose this endpoint publicly.

How do I set up the Nginx Prometheus exporter?

Run nginx/nginx-prometheus-exporter with --nginx.scrape-uri pointing at your stub_status URL. It exposes metrics on port 9113. Add a Prometheus scrape job targeting localhost:9113, then build Grafana dashboards from nginx_connections_active, nginx_http_requests_total, and nginx_up metrics.

What Nginx alert thresholds should I set?

Starting thresholds: 5xx rate > 1% for 5 minutes (critical), active connections > 80% of capacity (warning), Nginx process down (critical), upstream p95 latency > 2 seconds (warning). Tune to your traffic baseline — high-traffic sites may tolerate higher error rates during flash sales.

What is the difference between Nginx open source and Nginx Plus monitoring?

Open source provides basic connection counts via stub_status. Nginx Plus adds a /api endpoint with per-upstream metrics, health check results, per-zone traffic stats, and a live dashboard. For most teams, the Prometheus exporter + log parsing fills the open-source monitoring gap at no cost.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

Related Guides

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your Nginx-fronted services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial