SREMonitoringGoogle SRE Book

The Four Golden Signals of Monitoring: Latency, Traffic, Errors & Saturation (2026)

Google's Site Reliability Engineering book introduced the Four Golden Signals as the minimum viable monitoring framework for any user-facing service. If you can only measure four things, measure these. This guide explains each signal, how to measure it with Prometheus, alert thresholds, and when to use the USE and RED methods instead.

Updated April 202611 min readSRE / Monitoring / Prometheus
Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

The Four Golden Signals — Quick Reference

⏱ Latency

How long requests take. Track p99, not average.

Alert: p99 > SLO threshold

📈 Traffic

How much demand hits your service. Requests/second.

Alert: Sudden drop or spike

❌ Errors

Rate of failed requests. 5xx vs 4xx matters.

Alert: Error rate > 1%

🔥 Saturation

How full is your system. Leading failure indicator.

Alert: Utilization > 80%

Signal 1: Latency

Latency is the time it takes to serve a request. The most important rule: never alert on average latency. Average latency hides tail latency — the slow requests that actually hurt users.

Why averages lie

100 requests: 99 complete in 10ms, 1 takes 10,000ms. Average = 108ms (looks fine). p99 = 10,000ms (the real problem). Always track p50 (typical), p95 (most users), and p99 (worst case). For payment flows and critical paths, also track p99.9.

# Prometheus: latency histogram (best practice)
# Use a histogram, not a gauge — histograms allow percentile calculation

http_request_duration_seconds = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    labelnames=["method", "path", "status"],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
)

# PromQL: p99 latency by path
histogram_quantile(0.99,
  sum by (path, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Alert: p99 latency exceeds SLO
- alert: HighLatencyP99
  expr: |
    histogram_quantile(0.99,
      sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
    ) > 1.0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "p99 latency {{ $value | humanizeDuration }} exceeds 1s SLO"

# Separate "fast" and "slow" latency:
# Track successful requests (2xx) separately from errors (5xx)
# A slow error is a different problem than a slow success

# Latency by percentile: dashboard PromQL
# p50:  histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))
# p95:  histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# p99:  histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# p999: histogram_quantile(0.999, rate(http_request_duration_seconds_bucket[5m]))

Signal 2: Traffic

Traffic measures how much demand is being placed on your service. It's context for the other signals — an error rate of 1% during 100 rps (1 error/s) is different from 1% during 10,000 rps (100 errors/s).

# Traffic = demand on the system
# For APIs: requests per second
rate(http_requests_total[1m])

# For databases: queries per second
rate(mysql_global_status_queries[1m])

# For queues: messages per second
rate(rabbitmq_queue_messages_published_total[1m])

# Traffic anomaly alerts (sudden drop is often a problem):
- alert: TrafficDropDetected
  expr: |
    rate(http_requests_total[5m])
    < (rate(http_requests_total[1h] offset 1h) * 0.5)
  for: 10m
  annotations:
    summary: "Traffic dropped >50% vs same time last hour — possible outage"

# Traffic spike alert (may need to scale):
- alert: TrafficSpike
  expr: |
    rate(http_requests_total[5m])
    > (rate(http_requests_total[1h] offset 1h) * 3)
  for: 5m
  annotations:
    summary: "Traffic 3x higher than last hour — check saturation signals"

# Breakdown by endpoint for capacity planning:
# topk(10, rate(http_requests_total[5m])) by (path)
📡
Recommended

Monitor all four golden signals with Better Stack

Better Stack tracks latency, error rates, and traffic from 30+ global locations — alerting your on-call team the moment any signal degrades.

Try Better Stack Free →

Signal 3: Errors

Errors measure the rate of failed requests. Key distinction: server errors (5xx) represent bugs or overload, while client errors (4xx) are usually not your problem — but a sudden spike in 4xx can indicate an upstream change breaking your API contract.

# Error rate: server errors vs total
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))

# Alert: server error rate > 1% for 5 minutes
- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status=~"5.."}[5m]))
    / sum(rate(http_requests_total[5m])) > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Error rate {{ $value | humanizePercentage }} — exceeds 1% threshold"

# Track 4xx separately (client errors, not your fault but worth watching):
- alert: ClientErrorSpike
  expr: |
    rate(http_requests_total{status=~"4.."}[5m])
    > rate(http_requests_total{status=~"4.."}[5m] offset 1h) * 5
  for: 5m
  annotations:
    summary: "4xx errors spiked 5x vs 1h ago — possible API contract change"

# Break down errors by endpoint to find the culprit:
# topk(5,
#   rate(http_requests_total{status=~"5.."}[5m])
# ) by (path)

# "Slow errors" — errors that also take a long time
# Don't count error latency in your p99 SLO
# Separate SLO for error latency vs success latency

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your production services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Signal 4: Saturation

Saturation is the most powerful signal — it's a leading indicator. Resources become saturated before errors increase. Fix saturation to prevent errors.

ResourceSaturation MetricAlert Threshold
CPUavg(cpu_utilization)>80% for 5m
Memoryused_bytes / total_bytes>85%
Disk IOPSdisk_io_time / 1000>80%
Networkbytes_total / interface_speed>70%
Thread poolactive_threads / max_threads>90%
DB connectionschecked_out / pool_size>80%
Queue depthqueue_size / processing_rate>60s drain time
# CPU saturation alert
- alert: HighCpuSaturation
  expr: avg(rate(process_cpu_seconds_total[5m])) * 100 > 80
  for: 5m
  annotations:
    summary: "CPU at {{ $value }}% — approaching saturation"

# Memory saturation
- alert: HighMemorySaturation
  expr: |
    process_resident_memory_bytes
    / node_memory_MemTotal_bytes > 0.85
  for: 5m
  annotations:
    summary: "Memory at {{ $value | humanizePercentage }} of total"

# Database connection pool saturation
- alert: DbPoolSaturation
  expr: |
    db_pool_checked_out_connections
    / db_pool_max_connections > 0.80
  for: 2m
  annotations:
    summary: "DB pool {{ $value | humanizePercentage }} utilized — new connections will block"

# Thread pool saturation (Java/Tomcat example)
- alert: ThreadPoolSaturation
  expr: |
    tomcat_threads_busy_threads
    / tomcat_threads_config_max_threads > 0.90
  for: 2m
  annotations:
    summary: "Thread pool {{ $value | humanizePercentage }} — request queuing imminent"

USE vs RED vs Four Golden Signals

FrameworkAcronymBest ForOrigin
Four Golden SignalsLatency, Traffic, Errors, SaturationUser-facing services — what users experienceGoogle SRE Book (2016)
USE MethodUtilization, Saturation, ErrorsResources (CPU, disk, network) — infrastructure layerBrendan Gregg, Netflix
RED MethodRate, Errors, DurationMicroservices — per-service request metricsTom Wilkie, Grafana Labs

Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer. A high latency signal at the application layer + high CPU saturation at the infrastructure layer tells you exactly what to fix.

FAQ

What are the Four Golden Signals?

Google's SRE framework for monitoring user-facing services: (1) Latency — how long requests take (track p99, not average), (2) Traffic — demand on your system (requests/second), (3) Errors — rate of failed requests (5xx rate), (4) Saturation — how full your system is (CPU/memory/connection pool utilization). If you can only measure four things, measure these.

Why monitor p99 latency instead of average latency?

Average latency hides tail latency. Example: 99 requests at 10ms + 1 request at 10 seconds = 108ms average (looks fine) but 10s p99 (real problem). For distributed systems, slow requests cascade — 1% slow in service A becomes 50% slow in service B making parallel calls. Track p50 (typical), p95 (most users), p99 (worst case), and p99.9 for critical payment flows.

What is the difference between the Four Golden Signals, USE method, and RED method?

They target different layers: Four Golden Signals — user-facing services, what the user experiences. USE Method (Brendan Gregg) — resources (CPUs, disks, network interfaces): Utilization + Saturation + Errors. RED Method (Tom Wilkie) — microservices: Rate + Errors + Duration. Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer.

How do I measure saturation for my service?

Saturation is resource utilization relative to capacity: CPU % utilization, memory used/total, thread pool active/max, DB connections checked-out/pool-size, queue depth (jobs/drain-rate = seconds-to-drain). Alert at 80% for most resources. Saturation is a leading indicator — resources exhaust before errors appear, giving you time to act.

Should I alert on all four golden signals?

Alert on symptoms (latency and errors), not causes. Always alert on error rate (>1% 5xx for 5min) and latency (p99 > SLO for 5min). Traffic is context — alert on anomalies (sudden 50% drop), not normal levels. Saturation is a leading indicator — alert when approaching limits (>80%) to prevent future errors. Never page on average latency or traffic counts alone.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

Related Guides

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your production services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial