Web Server MonitoringNginx2026 Guide

Nginx Monitoring Guide: Key Metrics, Alerts & Tools (2026)

Q: What are the most important Nginx metrics to monitor?

The six most critical Nginx metrics are: (1) Active connections — workers handling requests right now, (2) Requests per second — traffic rate, (3) HTTP 4xx/5xx error rate — client and server errors, (4) Request latency — p50/p95/p99 response times, (5) Upstream response time — how long backends take, (6) Worker process CPU/memory — server health. Track all six together; a spike in 5xx with no change in RPS usually points to a backend or upstream problem, not Nginx itself.

Q: How do I enable the Nginx status page?

Enable stub_status in your Nginx config: add a location block with "stub_status on;" and restrict access to localhost. For Nginx Plus, use /api/status instead for richer metrics including upstream health. After reloading Nginx, curl http://localhost/nginx_status returns active connections, accepts, handled, and requests counts.

Q: How do I set up the Nginx Prometheus exporter?

Run nginx-prometheus-exporter (by Nginx Inc.) as a sidecar or standalone process. Point it at your stub_status URL with --nginx.scrape-uri=http://localhost/nginx_status. It exposes metrics on port 9113 in Prometheus format. Add a scrape job in prometheus.yml targeting localhost:9113. The exporter gives you nginx_connections_active, nginx_http_requests_total, nginx_up, and more.

Q: What Nginx alert thresholds should I set?

Recommended starting thresholds: 5xx error rate > 1% for 5 minutes (critical), active connections > 80% of worker_connections × worker_processes (warning), upstream response time p95 > 2 seconds (warning), upstream response time p95 > 5 seconds (critical), Nginx process not responding (critical). Tune thresholds to your traffic baseline — a high-traffic site may tolerate 2% 5xx during flash sales.

Q: What is the difference between Nginx open source and Nginx Plus monitoring?

Nginx open source provides basic metrics via stub_status: active connections, accepted connections, handled connections, and total requests. Nginx Plus adds a rich /api endpoint with per-upstream metrics (health, latency, failures), per-server zone traffic, stream stats, and a live activity monitoring dashboard. For production monitoring, the Prometheus exporter fills most gaps in the open-source version at no cost.

Nginx powers over 34% of the world's websites. When it breaks, everything downstream breaks too. This guide covers the metrics that matter, how to expose them with stub_status and Prometheus, and how to set alerts before users notice problems.

Updated April 2026•12 min read•SRE / DevOps

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Nginx Monitoring Checklist

✅ Enable stub_status module to expose basic connection metrics
✅ Deploy nginx-prometheus-exporter to scrape metrics into Prometheus
✅ Track: active connections, requests/sec, 5xx rate, upstream latency
✅ Alert when 5xx error rate > 1% for 5+ minutes
✅ Alert when active connections > 80% of worker capacity
✅ Use access logs for p95/p99 latency via log parsing (or Nginx Plus API)

Why Nginx Monitoring Is Different

Nginx is a reverse proxy, load balancer, and web server all at once. Unlike application-level monitoring (APM), Nginx monitoring sits at the infrastructure layer — it sees every request before your app code runs. This means:

→ A 504 Gateway Timeout in Nginx doesn't mean Nginx is broken — your upstream (app server, backend) is slow or dead
→ A 502 Bad Gateway means Nginx got a bad response from upstream, not from the client
→ High active connections with normal RPS means requests are taking longer — check upstream latency
→ Worker processes near capacity (approaching worker_connections limit) will cause connection refused errors

Good Nginx monitoring separates Nginx's own health (worker processes, connection limits) from upstream health (your app servers). Both matter.

Core Nginx Metrics

Connection Metrics

Metric	Source	What It Means	Alert Threshold
Active connections	stub_status	Workers currently handling requests + keep-alive + waiting	> 80% of `worker_connections × workers`
Accepted connections	stub_status	Total connections accepted since start (monotonic)	Rate drop > 50% vs baseline
Handled connections	stub_status	Should equal accepted; gap means connection drops	accepted − handled > 0
Waiting connections	stub_status	Keep-alive connections idle — high count is normal with long keepalive_timeout	Context-dependent

HTTP Error Metrics

These come from access log parsing or Nginx Plus. They're the most actionable indicators of user-visible problems.

Status Code	Meaning	Root Cause
502	Bad Gateway	Upstream returned an invalid response (crash, restart, invalid HTTP)
503	Service Unavailable	No upstream servers healthy, Nginx over worker limit, rate limiting
504	Gateway Timeout	Upstream too slow, proxy_read_timeout too short
499	Client Closed Request	Client gave up before response — usually slow upstream

📡

Recommended

Monitor your Nginx endpoints with Better Stack

Better Stack runs synthetic checks from 30+ global locations. HTTP, TCP, and keyword monitors — with on-call alerting when Nginx returns errors.

Try Better Stack Free →

Step 1 — Enable stub_status

The ngx_http_stub_status_module exposes basic connection metrics at an HTTP endpoint. It's included in most Nginx builds (verify with nginx -V | grep stub_status).

# /etc/nginx/conf.d/status.conf
server {
    listen 127.0.0.1:8080;
    server_name _;

    location /nginx_status {
        stub_status on;
        access_log off;
        allow 127.0.0.1;
        deny all;
    }
}

# Test and reload
nginx -t && nginx -s reload

# Output at http://localhost:8080/nginx_status:
# Active connections: 42
# server accepts handled requests
#  15234 15234 78901
# Reading: 1 Writing: 8 Waiting: 33

Security: Always restrict stub_status to 127.0.0.1 or your monitoring subnet. Never expose it on a public IP — it leaks connection metadata.

Reading the output: accepts = total connections since start; handled = connections that were actually processed (should match accepts); the difference signals dropped connections (usually a worker_connections limit issue).

Step 2 — Prometheus Exporter Setup

The official nginx-prometheus-exporter (by F5/Nginx Inc.) converts stub_status output to Prometheus metrics and exposes them on port 9113.

# Docker Compose example
version: '3'
services:
  nginx-exporter:
    image: nginx/nginx-prometheus-exporter:1.1
    command:
      - --nginx.scrape-uri=http://nginx:8080/nginx_status
    ports:
      - "9113:9113"
    depends_on:
      - nginx

# Prometheus scrape config
scrape_configs:
  - job_name: 'nginx'
    static_configs:
      - targets: ['localhost:9113']
    scrape_interval: 15s

# Key metrics exposed:
# nginx_connections_active
# nginx_connections_accepted_total
# nginx_connections_handled_total
# nginx_http_requests_total
# nginx_up (1 = healthy, 0 = can't reach nginx)

Essential Prometheus Alert Rules

groups:
  - name: nginx
    rules:
      # Nginx is down
      - alert: NginxDown
        expr: nginx_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Nginx is not responding"

      # Connection capacity warning (>80% of worker limit)
      - alert: NginxHighConnectionCount
        expr: nginx_connections_active > (worker_connections * worker_processes * 0.8)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Nginx connections approaching limit"

      # 5xx spike (requires log-based metric or Nginx Plus)
      - alert: NginxHigh5xxRate
        expr: |
          rate(nginx_http_requests_total{status=~"5.."}[5m]) /
          rate(nginx_http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Nginx 5xx error rate > 1%"

Step 3 — Access Log Parsing for Latency

stub_status doesn't expose per-request latency. The best open-source approach is parsing access logs with the $request_time variable using a log shipper like Promtail or Vector.

# nginx.conf — add request_time to log format
log_format monitoring '$remote_addr - $remote_user [$time_local] '
    '"$request" $status $body_bytes_sent '
    '"$http_referer" "$http_user_agent" '
    'rt=$request_time uct=$upstream_connect_time '
    'uht=$upstream_header_time urt=$upstream_response_time';

access_log /var/log/nginx/access.log monitoring;

# Key variables:
# $request_time       — total time from first byte to last byte sent to client
# $upstream_response_time — time your backend took (excludes Nginx overhead)
# $upstream_connect_time  — TCP connect time to upstream

With upstream_response_time in logs, a log aggregation tool (Loki, Better Stack Logs, Elastic) can compute p95 latency per endpoint. Alert when p95 exceeds your SLO threshold (e.g., 2 seconds for web pages, 500ms for APIs).

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Nginx-fronted services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Upstream Health Monitoring

When Nginx acts as a reverse proxy, your upstream servers are your real targets. Use passive health checks (built-in) or active health checks (Nginx Plus or the open-source nginx_upstream_check_module).

# Passive health check (open source)
# Marks upstream unhealthy after N consecutive failures
upstream app_servers {
    server 10.0.0.1:8000 max_fails=3 fail_timeout=30s;
    server 10.0.0.2:8000 max_fails=3 fail_timeout=30s;
    server 10.0.0.3:8000 backup;
}

# Active health check (Nginx Plus)
upstream app_servers {
    zone app_zone 64k;
    server 10.0.0.1:8000;
    server 10.0.0.2:8000;
}

server {
    location / {
        proxy_pass http://app_servers;
        health_check interval=5s fails=3 passes=2 uri=/health;
    }
}

For open-source Nginx, external uptime monitoring (like API Status Check or Better Stack) is the most reliable way to detect when specific upstream endpoints are failing. These tools probe your endpoints from outside and alert immediately when they return 5xx or go unresponsive.

Nginx Monitoring Tool Comparison

Tool	Best For	Nginx Support	Pricing
Better Stack	Uptime + log monitoring combined	Uptime checks + log ingestion	Free tier + $20/mo
Prometheus + Grafana	Full observability stack	nginx-prometheus-exporter	Open source (self-hosted)
Datadog	Enterprise full-stack APM	Native Nginx integration	$15+/host/month
New Relic	APM + infrastructure in one	Nginx integration via agent	Free 100GB/mo + usage billing
Grafana Cloud	Managed Prometheus + Loki	Exporter + log parsing	Free tier + usage-based

Common Nginx Performance Problems & Fixes

Too many open files error

Nginx hits the OS file descriptor limit under high load.

Fix: Set worker_rlimit_nofile 65535; in nginx.conf and raise the system limit with ulimit -n 65535 in the service unit file.

Worker connections exhausted

All worker slots full — new connections are rejected with 503.

Fix: Increase worker_connections (up to worker_rlimit_nofile) or scale horizontally. Set worker_processes auto; to use all CPU cores.

upstream timed out (110: Connection timed out)

Backend is slow. Default proxy_read_timeout is 60 seconds.

Fix: First diagnose why the backend is slow (slow query, memory pressure, CPU spike). Increasing proxy_read_timeout masks the problem — fix the root cause.

High memory usage

Each worker process holds memory; large proxy buffers multiply this.

Fix: Tune proxy_buffer_size and proxy_buffers down to minimum needed. Monitor worker RSS per process, not just total Nginx memory.

FAQ

What are the most important Nginx metrics to monitor?

The six most critical are: active connections, requests per second, HTTP 5xx error rate, request latency (p95), upstream response time, and worker process health. Track them together — a spike in 5xx with no change in RPS usually points to a backend problem, not Nginx itself.

How do I enable the Nginx status page?

Add a location block with "stub_status on;" inside a server block restricted to 127.0.0.1. After nginx -t && nginx -s reload, curl http://localhost/nginx_status returns connection counts. Never expose this endpoint publicly.

How do I set up the Nginx Prometheus exporter?

Run nginx/nginx-prometheus-exporter with --nginx.scrape-uri pointing at your stub_status URL. It exposes metrics on port 9113. Add a Prometheus scrape job targeting localhost:9113, then build Grafana dashboards from nginx_connections_active, nginx_http_requests_total, and nginx_up metrics.

What Nginx alert thresholds should I set?

Starting thresholds: 5xx rate > 1% for 5 minutes (critical), active connections > 80% of capacity (warning), Nginx process down (critical), upstream p95 latency > 2 seconds (warning). Tune to your traffic baseline — high-traffic sites may tolerate higher error rates during flash sales.

What is the difference between Nginx open source and Nginx Plus monitoring?

Open source provides basic connection counts via stub_status. Nginx Plus adds a /api endpoint with per-upstream metrics, health check results, per-zone traffic stats, and a live dashboard. For most teams, the Prometheus exporter + log parsing fills the open-source monitoring gap at no cost.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

14-day free trial

Stop checking — get alerted instantly

Next time your Nginx-fronted services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Nginx-fronted services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Nginx Monitoring Guide: Key Metrics, Alerts & Tools (2026)

TL;DR — Nginx Monitoring Checklist

Why Nginx Monitoring Is Different

Core Nginx Metrics

Connection Metrics

HTTP Error Metrics

Step 1 — Enable stub_status

Step 2 — Prometheus Exporter Setup

Essential Prometheus Alert Rules

Step 3 — Access Log Parsing for Latency

Stop checking — get alerted instantly

Upstream Health Monitoring

Nginx Monitoring Tool Comparison

Common Nginx Performance Problems & Fixes

Too many open files error

Worker connections exhausted

upstream timed out (110: Connection timed out)

High memory usage

FAQ

What are the most important Nginx metrics to monitor?

How do I enable the Nginx status page?

How do I set up the Nginx Prometheus exporter?

What Nginx alert thresholds should I set?

What is the difference between Nginx open source and Nginx Plus monitoring?

Related Guides

Infrastructure Monitoring Guide

Network Monitoring Guide

Database Monitoring Guide

Best APM Tools 2026

Stop checking — get alerted instantly