SREMonitoringGoogle SRE Book

The Four Golden Signals of Monitoring: Latency, Traffic, Errors & Saturation (2026)

Q: What are the Four Golden Signals?

The Four Golden Signals are Google's framework for monitoring user-facing services, described in the Google SRE Book: (1) Latency — how long it takes to serve a request; track p50, p95, and p99, not averages. (2) Traffic — how much demand is being placed on your system; requests per second for APIs, queries per second for databases. (3) Errors — the rate of failed requests; distinguish server errors (5xx) from client errors (4xx). (4) Saturation — how "full" your service is; the fraction of capacity being used. Saturation is the leading indicator — resource exhaustion predicts failure before errors occur.

Q: Why monitor p99 latency instead of average latency?

Average latency hides tail latency — the slowest requests that actually hurt users. Consider 100 requests: 99 complete in 10ms but 1 takes 10 seconds. The average is ~110ms (looks fine), but 1% of your users wait 10 seconds. p99 (99th percentile) = 10 seconds — correctly signals the problem. More importantly: slow requests are often queue-building events. The Google SRE Book notes that for distributed systems, the 1% of slow requests in service A can become 50% of slow requests in service B if B makes parallel calls to A. Track p50 for typical user experience, p95 for most users, and p99 for worst case. Never alert only on average latency.

Q: What is the difference between the Four Golden Signals, USE method, and RED method?

These three frameworks apply to different layers: Four Golden Signals (Google SRE) — for user-facing services. Monitor what the user experiences. USE Method (Brendan Gregg) — for resources (CPUs, disks, network interfaces). Utilization (% busy), Saturation (queue depth), Errors (hardware errors). Best for infrastructure monitoring. RED Method (Tom Wilkie) — for microservices. Rate (requests/second), Errors (failed requests/second), Duration (latency distribution). Essentially a subset of Golden Signals for service-to-service calls. Use all three together: RED/Golden Signals at the API layer, USE at the infrastructure layer.

Q: How do I measure saturation for my service?

Saturation measures proximity to capacity limit. For different resource types: CPU — utilization % (>80% is warning, >95% is critical). Memory — heap used / heap limit %. Network — bytes in+out vs interface capacity %. Disk — iops used vs disk max iops, or disk write latency (latency rises sharply at 70-80% disk saturation). Thread pool — active threads / max threads. Database connection pool — checked out connections / pool size. Queue depth — jobs waiting / processing rate = queue drain time. Saturation is the leading indicator because resources become saturated before errors increase — fixing saturation prevents the failure.

Q: Should I alert on all four golden signals?

Alert on symptoms (latency and errors), not causes (traffic and saturation individually). Rule of thumb: always alert on error rate (> 1% 5xx) and latency (p99 > SLO threshold). Traffic is context — useful for dashboards and anomaly detection (sudden traffic drop = possible incident) but rarely warrants direct alerting. Saturation is a leading indicator — alert when approaching limits (>80% capacity) to prevent errors before they occur. Never alert on ALL four independently — you'll get too many pages. The Google SRE Book recommends alerting on SLO burn rate: if your 30-day error budget is burning too fast, page. Otherwise, log and record.

Google's Site Reliability Engineering book introduced the Four Golden Signals as the minimum viable monitoring framework for any user-facing service. If you can only measure four things, measure these. This guide explains each signal, how to measure it with Prometheus, alert thresholds, and when to use the USE and RED methods instead.

Updated April 2026•11 min read•SRE / Monitoring / Prometheus

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

The Four Golden Signals — Quick Reference

⏱ Latency

How long requests take. Track p99, not average.

Alert: p99 > SLO threshold

📈 Traffic

How much demand hits your service. Requests/second.

Alert: Sudden drop or spike

❌ Errors

Rate of failed requests. 5xx vs 4xx matters.

Alert: Error rate > 1%

🔥 Saturation

How full is your system. Leading failure indicator.

Alert: Utilization > 80%

Signal 1: Latency

Latency is the time it takes to serve a request. The most important rule: never alert on average latency. Average latency hides tail latency — the slow requests that actually hurt users.

Why averages lie

100 requests: 99 complete in 10ms, 1 takes 10,000ms. Average = 108ms (looks fine). p99 = 10,000ms (the real problem). Always track p50 (typical), p95 (most users), and p99 (worst case). For payment flows and critical paths, also track p99.9.

# Prometheus: latency histogram (best practice)
# Use a histogram, not a gauge — histograms allow percentile calculation

http_request_duration_seconds = Histogram(
    "http_request_duration_seconds",
    "HTTP request latency",
    labelnames=["method", "path", "status"],
    buckets=[0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5, 10]
)

# PromQL: p99 latency by path
histogram_quantile(0.99,
  sum by (path, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Alert: p99 latency exceeds SLO
- alert: HighLatencyP99
  expr: |
    histogram_quantile(0.99,
      sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
    ) > 1.0
  for: 5m
  labels:
    severity: warning
  annotations:
    summary: "p99 latency {{ $value | humanizeDuration }} exceeds 1s SLO"

# Separate "fast" and "slow" latency:
# Track successful requests (2xx) separately from errors (5xx)
# A slow error is a different problem than a slow success

# Latency by percentile: dashboard PromQL
# p50:  histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))
# p95:  histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))
# p99:  histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))
# p999: histogram_quantile(0.999, rate(http_request_duration_seconds_bucket[5m]))

Signal 2: Traffic

Traffic measures how much demand is being placed on your service. It's context for the other signals — an error rate of 1% during 100 rps (1 error/s) is different from 1% during 10,000 rps (100 errors/s).

# Traffic = demand on the system
# For APIs: requests per second
rate(http_requests_total[1m])

# For databases: queries per second
rate(mysql_global_status_queries[1m])

# For queues: messages per second
rate(rabbitmq_queue_messages_published_total[1m])

# Traffic anomaly alerts (sudden drop is often a problem):
- alert: TrafficDropDetected
  expr: |
    rate(http_requests_total[5m])
    < (rate(http_requests_total[1h] offset 1h) * 0.5)
  for: 10m
  annotations:
    summary: "Traffic dropped >50% vs same time last hour — possible outage"

# Traffic spike alert (may need to scale):
- alert: TrafficSpike
  expr: |
    rate(http_requests_total[5m])
    > (rate(http_requests_total[1h] offset 1h) * 3)
  for: 5m
  annotations:
    summary: "Traffic 3x higher than last hour — check saturation signals"

# Breakdown by endpoint for capacity planning:
# topk(10, rate(http_requests_total[5m])) by (path)

📡

Recommended

Monitor all four golden signals with Better Stack

Better Stack tracks latency, error rates, and traffic from 30+ global locations — alerting your on-call team the moment any signal degrades.

Try Better Stack Free →

Signal 3: Errors

Errors measure the rate of failed requests. Key distinction: server errors (5xx) represent bugs or overload, while client errors (4xx) are usually not your problem — but a sudden spike in 4xx can indicate an upstream change breaking your API contract.

# Error rate: server errors vs total
sum(rate(http_requests_total{status=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m]))

# Alert: server error rate > 1% for 5 minutes
- alert: HighErrorRate
  expr: |
    sum(rate(http_requests_total{status=~"5.."}[5m]))
    / sum(rate(http_requests_total[5m])) > 0.01
  for: 5m
  labels:
    severity: critical
  annotations:
    summary: "Error rate {{ $value | humanizePercentage }} — exceeds 1% threshold"

# Track 4xx separately (client errors, not your fault but worth watching):
- alert: ClientErrorSpike
  expr: |
    rate(http_requests_total{status=~"4.."}[5m])
    > rate(http_requests_total{status=~"4.."}[5m] offset 1h) * 5
  for: 5m
  annotations:
    summary: "4xx errors spiked 5x vs 1h ago — possible API contract change"

# Break down errors by endpoint to find the culprit:
# topk(5,
#   rate(http_requests_total{status=~"5.."}[5m])
# ) by (path)

# "Slow errors" — errors that also take a long time
# Don't count error latency in your p99 SLO
# Separate SLO for error latency vs success latency

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your production services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Signal 4: Saturation

Saturation is the most powerful signal — it's a leading indicator. Resources become saturated before errors increase. Fix saturation to prevent errors.

Resource	Saturation Metric	Alert Threshold
CPU	avg(cpu_utilization)	>80% for 5m
Memory	used_bytes / total_bytes	>85%
Disk IOPS	disk_io_time / 1000	>80%
Network	bytes_total / interface_speed	>70%
Thread pool	active_threads / max_threads	>90%
DB connections	checked_out / pool_size	>80%
Queue depth	queue_size / processing_rate	>60s drain time

# CPU saturation alert
- alert: HighCpuSaturation
  expr: avg(rate(process_cpu_seconds_total[5m])) * 100 > 80
  for: 5m
  annotations:
    summary: "CPU at {{ $value }}% — approaching saturation"

# Memory saturation
- alert: HighMemorySaturation
  expr: |
    process_resident_memory_bytes
    / node_memory_MemTotal_bytes > 0.85
  for: 5m
  annotations:
    summary: "Memory at {{ $value | humanizePercentage }} of total"

# Database connection pool saturation
- alert: DbPoolSaturation
  expr: |
    db_pool_checked_out_connections
    / db_pool_max_connections > 0.80
  for: 2m
  annotations:
    summary: "DB pool {{ $value | humanizePercentage }} utilized — new connections will block"

# Thread pool saturation (Java/Tomcat example)
- alert: ThreadPoolSaturation
  expr: |
    tomcat_threads_busy_threads
    / tomcat_threads_config_max_threads > 0.90
  for: 2m
  annotations:
    summary: "Thread pool {{ $value | humanizePercentage }} — request queuing imminent"

USE vs RED vs Four Golden Signals

Framework	Acronym	Best For	Origin
Four Golden Signals	Latency, Traffic, Errors, Saturation	User-facing services — what users experience	Google SRE Book (2016)
USE Method	Utilization, Saturation, Errors	Resources (CPU, disk, network) — infrastructure layer	Brendan Gregg, Netflix
RED Method	Rate, Errors, Duration	Microservices — per-service request metrics	Tom Wilkie, Grafana Labs

Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer. A high latency signal at the application layer + high CPU saturation at the infrastructure layer tells you exactly what to fix.

FAQ

What are the Four Golden Signals?

Google's SRE framework for monitoring user-facing services: (1) Latency — how long requests take (track p99, not average), (2) Traffic — demand on your system (requests/second), (3) Errors — rate of failed requests (5xx rate), (4) Saturation — how full your system is (CPU/memory/connection pool utilization). If you can only measure four things, measure these.

Why monitor p99 latency instead of average latency?

Average latency hides tail latency. Example: 99 requests at 10ms + 1 request at 10 seconds = 108ms average (looks fine) but 10s p99 (real problem). For distributed systems, slow requests cascade — 1% slow in service A becomes 50% slow in service B making parallel calls. Track p50 (typical), p95 (most users), p99 (worst case), and p99.9 for critical payment flows.

What is the difference between the Four Golden Signals, USE method, and RED method?

They target different layers: Four Golden Signals — user-facing services, what the user experiences. USE Method (Brendan Gregg) — resources (CPUs, disks, network interfaces): Utilization + Saturation + Errors. RED Method (Tom Wilkie) — microservices: Rate + Errors + Duration. Use them together: RED/Golden Signals at the API layer, USE at the infrastructure layer.

How do I measure saturation for my service?

Saturation is resource utilization relative to capacity: CPU % utilization, memory used/total, thread pool active/max, DB connections checked-out/pool-size, queue depth (jobs/drain-rate = seconds-to-drain). Alert at 80% for most resources. Saturation is a leading indicator — resources exhaust before errors appear, giving you time to act.

Should I alert on all four golden signals?

Alert on symptoms (latency and errors), not causes. Always alert on error rate (>1% 5xx for 5min) and latency (p99 > SLO for 5min). Traffic is context — alert on anomalies (sudden 50% drop), not normal levels. Saturation is a leading indicator — alert when approaching limits (>80%) to prevent future errors. Never page on average latency or traffic counts alone.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

14-day free trial

Stop checking — get alerted instantly

Next time your production services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your production services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

The Four Golden Signals of Monitoring: Latency, Traffic, Errors & Saturation (2026)

The Four Golden Signals — Quick Reference

Signal 1: Latency

Why averages lie

Signal 2: Traffic

Signal 3: Errors

Stop checking — get alerted instantly

Signal 4: Saturation

USE vs RED vs Four Golden Signals

FAQ

What are the Four Golden Signals?

Why monitor p99 latency instead of average latency?

What is the difference between the Four Golden Signals, USE method, and RED method?

How do I measure saturation for my service?

Should I alert on all four golden signals?

Related Guides

SLA vs SLO vs SLI Explained

Error Budget Guide 2026

Alert Fatigue Guide

SRE Toolchain 2026

Stop checking — get alerted instantly