The Four Golden Signals of Monitoring: Latency, Traffic, Errors & Saturation (2026)
Google's Site Reliability Engineering book introduced the Four Golden Signals as the minimum viable monitoring framework for any user-facing service. If you can only measure four things, measure these. This guide explains each signal, how to measure it with Prometheus, alert thresholds, and when to use the USE and RED methods instead.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
The Four Golden Signals — Quick Reference
⏱ Latency
How long requests take. Track p99, not average.
Alert: p99 > SLO threshold
📈 Traffic
How much demand hits your service. Requests/second.
Alert: Sudden drop or spike
❌ Errors
Rate of failed requests. 5xx vs 4xx matters.
Alert: Error rate > 1%
🔥 Saturation
How full is your system. Leading failure indicator.
Alert: Utilization > 80%
Signal 1: Latency
Latency is the time it takes to serve a request. The most important rule: never alert on average latency. Average latency hides tail latency — the slow requests that actually hurt users.
Why averages lie
100 requests: 99 complete in 10ms, 1 takes 10,000ms. Average = 108ms (looks fine). p99 = 10,000ms (the real problem). Always track p50 (typical), p95 (most users), and p99 (worst case). For payment flows and critical paths, also track p99.9.
# Prometheus: latency histogram (best practice)
# Use a histogram, not a gauge — histograms allow percentile calculation
http_request_duration_seconds = Histogram(
"http_request_duration_seconds",
"HTTP request latency",
labelnames=["method", "path", "status"],
buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
)
# PromQL: p99 latency by path
histogram_quantile(0.99,
sum by (path, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)
# Alert: p99 latency exceeds SLO
- alert: HighLatencyP99
expr: |
histogram_quantile(0.99,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "p99 latency {{ $value | humanizeDuration }} exceeds 1s SLO"
# Separate "fast" and "slow" latency:
# Track successful requests (2xx) separately from errors (5xx)
# A slow error is a different problem than a slow success
# Latency by percentile: dashboard PromQL
# p50: histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))
# p95: histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# p99: histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# p999: histogram_quantile(0.999, rate(http_request_duration_seconds_bucket[5m]))Signal 2: Traffic
Traffic measures how much demand is being placed on your service. It's context for the other signals — an error rate of 1% during 100 rps (1 error/s) is different from 1% during 10,000 rps (100 errors/s).
# Traffic = demand on the system
# For APIs: requests per second
rate(http_requests_total[1m])
# For databases: queries per second
rate(mysql_global_status_queries[1m])
# For queues: messages per second
rate(rabbitmq_queue_messages_published_total[1m])
# Traffic anomaly alerts (sudden drop is often a problem):
- alert: TrafficDropDetected
expr: |
rate(http_requests_total[5m])
< (rate(http_requests_total[1h] offset 1h) * 0.5)
for: 10m
annotations:
summary: "Traffic dropped >50% vs same time last hour — possible outage"
# Traffic spike alert (may need to scale):
- alert: TrafficSpike
expr: |
rate(http_requests_total[5m])
> (rate(http_requests_total[1h] offset 1h) * 3)
for: 5m
annotations:
summary: "Traffic 3x higher than last hour — check saturation signals"
# Breakdown by endpoint for capacity planning:
# topk(10, rate(http_requests_total[5m])) by (path)Monitor all four golden signals with Better Stack
Better Stack tracks latency, error rates, and traffic from 30+ global locations — alerting your on-call team the moment any signal degrades.
Try Better Stack Free →Signal 3: Errors
Errors measure the rate of failed requests. Key distinction: server errors (5xx) represent bugs or overload, while client errors (4xx) are usually not your problem — but a sudden spike in 4xx can indicate an upstream change breaking your API contract.
# Error rate: server errors vs total
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))
# Alert: server error rate > 1% for 5 minutes
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Error rate {{ $value | humanizePercentage }} — exceeds 1% threshold"
# Track 4xx separately (client errors, not your fault but worth watching):
- alert: ClientErrorSpike
expr: |
rate(http_requests_total{status=~"4.."}[5m])
> rate(http_requests_total{status=~"4.."}[5m] offset 1h) * 5
for: 5m
annotations:
summary: "4xx errors spiked 5x vs 1h ago — possible API contract change"
# Break down errors by endpoint to find the culprit:
# topk(5,
# rate(http_requests_total{status=~"5.."}[5m])
# ) by (path)
# "Slow errors" — errors that also take a long time
# Don't count error latency in your p99 SLO
# Separate SLO for error latency vs success latencyAlert Pro
14-day free trialStop checking — get alerted instantly
Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your production services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Signal 4: Saturation
Saturation is the most powerful signal — it's a leading indicator. Resources become saturated before errors increase. Fix saturation to prevent errors.
| Resource | Saturation Metric | Alert Threshold |
|---|---|---|
| CPU | avg(cpu_utilization) | >80% for 5m |
| Memory | used_bytes / total_bytes | >85% |
| Disk IOPS | disk_io_time / 1000 | >80% |
| Network | bytes_total / interface_speed | >70% |
| Thread pool | active_threads / max_threads | >90% |
| DB connections | checked_out / pool_size | >80% |
| Queue depth | queue_size / processing_rate | >60s drain time |
# CPU saturation alert
- alert: HighCpuSaturation
expr: avg(rate(process_cpu_seconds_total[5m])) * 100 > 80
for: 5m
annotations:
summary: "CPU at {{ $value }}% — approaching saturation"
# Memory saturation
- alert: HighMemorySaturation
expr: |
process_resident_memory_bytes
/ node_memory_MemTotal_bytes > 0.85
for: 5m
annotations:
summary: "Memory at {{ $value | humanizePercentage }} of total"
# Database connection pool saturation
- alert: DbPoolSaturation
expr: |
db_pool_checked_out_connections
/ db_pool_max_connections > 0.80
for: 2m
annotations:
summary: "DB pool {{ $value | humanizePercentage }} utilized — new connections will block"
# Thread pool saturation (Java/Tomcat example)
- alert: ThreadPoolSaturation
expr: |
tomcat_threads_busy_threads
/ tomcat_threads_config_max_threads > 0.90
for: 2m
annotations:
summary: "Thread pool {{ $value | humanizePercentage }} — request queuing imminent"USE vs RED vs Four Golden Signals
| Framework | Acronym | Best For | Origin |
|---|---|---|---|
| Four Golden Signals | Latency, Traffic, Errors, Saturation | User-facing services — what users experience | Google SRE Book (2016) |
| USE Method | Utilization, Saturation, Errors | Resources (CPU, disk, network) — infrastructure layer | Brendan Gregg, Netflix |
| RED Method | Rate, Errors, Duration | Microservices — per-service request metrics | Tom Wilkie, Grafana Labs |
Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer. A high latency signal at the application layer + high CPU saturation at the infrastructure layer tells you exactly what to fix.
FAQ
What are the Four Golden Signals?
Google's SRE framework for monitoring user-facing services: (1) Latency — how long requests take (track p99, not average), (2) Traffic — demand on your system (requests/second), (3) Errors — rate of failed requests (5xx rate), (4) Saturation — how full your system is (CPU/memory/connection pool utilization). If you can only measure four things, measure these.
Why monitor p99 latency instead of average latency?
Average latency hides tail latency. Example: 99 requests at 10ms + 1 request at 10 seconds = 108ms average (looks fine) but 10s p99 (real problem). For distributed systems, slow requests cascade — 1% slow in service A becomes 50% slow in service B making parallel calls. Track p50 (typical), p95 (most users), p99 (worst case), and p99.9 for critical payment flows.
What is the difference between the Four Golden Signals, USE method, and RED method?
They target different layers: Four Golden Signals — user-facing services, what the user experiences. USE Method (Brendan Gregg) — resources (CPUs, disks, network interfaces): Utilization + Saturation + Errors. RED Method (Tom Wilkie) — microservices: Rate + Errors + Duration. Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer.
How do I measure saturation for my service?
Saturation is resource utilization relative to capacity: CPU % utilization, memory used/total, thread pool active/max, DB connections checked-out/pool-size, queue depth (jobs/drain-rate = seconds-to-drain). Alert at 80% for most resources. Saturation is a leading indicator — resources exhaust before errors appear, giving you time to act.
Should I alert on all four golden signals?
Alert on symptoms (latency and errors), not causes. Always alert on error rate (>1% 5xx for 5min) and latency (p99 > SLO for 5min). Traffic is context — alert on anomalies (sudden 50% drop), not normal levels. Saturation is a leading indicator — alert when approaching limits (>80%) to prevent future errors. Never page on average latency or traffic counts alone.
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”
Related Guides
SLA vs SLO vs SLI Explained
Service level objectives, agreements, and indicators for SRE teams.
Error Budget Guide 2026
How to calculate and manage SRE error budgets with Prometheus.
Alert Fatigue Guide
How to reduce noisy alerts and improve on-call quality of life.
SRE Toolchain 2026
The complete observability stack for site reliability engineers.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your production services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial