gRPCMicroservices2026 Guide

gRPC Monitoring Guide: Metrics, Tracing & Observability (2026)

Q: What metrics should I monitor for gRPC services?

Key gRPC metrics: (1) grpc_server_handled_total labeled by grpc_method and grpc_code — gives RPC count and error rate by status code, (2) grpc_server_handling_seconds — latency distribution per method, (3) grpc_server_msg_received_total and grpc_server_msg_sent_total — message throughput for streaming RPCs, (4) grpc_client_handled_total on clients making downstream calls — for client-side error rates, (5) Connection pool utilization — gRPC reuses HTTP/2 connections, monitor per-channel inflight calls vs idle channels.

Q: How do I add OpenTelemetry tracing to gRPC?

OpenTelemetry has first-class gRPC instrumentation for most languages. In Go: use go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc as server and client interceptors — it auto-propagates context and creates spans for every RPC. In Python: opentelemetry-instrumentation-grpc auto-instruments grpc.server and grpc.channel. In Java: opentelemetry-instrumentation-grpc-1.6 covers both client and server. The OTel gRPC instrumentation creates spans with grpc.method, grpc.status_code, and net.peer.name attributes automatically.

Q: How do gRPC status codes map to monitoring alerts?

gRPC uses numeric status codes instead of HTTP status codes. For alerting: UNAVAILABLE (14) and INTERNAL (13) indicate server-side errors — alert when rate > 1%. DEADLINE_EXCEEDED (4) means the client timeout fired before the server responded — high rates suggest either slow server or too-aggressive client timeouts. RESOURCE_EXHAUSTED (8) means rate limiting or server overload — common cause is hitting gRPC max_concurrent_streams limit. NOT_FOUND (5) and INVALID_ARGUMENT (3) are usually client bugs. OK (0) is success. Track error rates by both method and status code.

Q: How do I monitor gRPC streaming RPCs?

Streaming RPCs (server-streaming, client-streaming, bidirectional) need different monitoring than unary calls. Key indicators: (1) stream duration — how long individual streams stay open, (2) messages per stream — an unusual spike or drop can indicate stuck consumers, (3) error code at stream close — END_STREAM vs RST_STREAM (reset) vs GOAWAY, (4) active stream count per server — a monotonically increasing active stream count indicates stream leaks from unclosed connections. The grpc_server_msg_received_total and grpc_server_msg_sent_total metrics cover per-method message throughput for streaming.

Q: How do I debug gRPC UNAVAILABLE errors?

UNAVAILABLE (code 14) is the gRPC equivalent of HTTP 503 — server is not accepting connections. Common causes: (1) Server is starting/stopping — expected brief UNAVAILABLE at deployment, (2) All gRPC load balancer backends are unhealthy — check target readiness probes, (3) TLS certificate mismatch — verify the server cert CN/SAN matches the client dial address, (4) max_concurrent_streams limit reached — raise the limit or scale horizontally, (5) Connection to server lost mid-stream — gRPC clients should implement exponential backoff retry on UNAVAILABLE. Use grpcurl or grpc-health-probe to test connectivity manually: grpc-health-probe -addr=:50051.

gRPC's binary protocol, HTTP/2 multiplexing, and streaming RPC patterns require different monitoring approaches than REST APIs. This guide covers how to instrument gRPC services with OpenTelemetry, expose Prometheus metrics via interceptors, trace distributed calls across polyglot services, and alert on gRPC status codes.

Updated April 2026•11 min read•gRPC / Microservices / Protobuf

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — gRPC Monitoring Checklist

✅ Track grpc_server_handled_total by method and status code — your error rate signal
✅ Alert on UNAVAILABLE (14) and INTERNAL (13) status codes
✅ Add OpenTelemetry gRPC interceptors for distributed trace propagation
✅ Monitor grpc_server_handling_seconds for latency histograms
✅ Use grpc-health-probe for Kubernetes health checks
✅ Track active stream count for server-streaming and bidirectional RPCs

gRPC vs REST: Monitoring Differences

gRPC's architecture creates monitoring challenges that don't exist with REST:

Binary protocol — no HTTP proxy visibility

gRPC uses Protobuf binary encoding over HTTP/2. Standard reverse proxies (nginx with HTTP/1.1) can't read gRPC frames. You need an HTTP/2-capable load balancer (Envoy, Traefik, NGINX with http2, or cloud load balancers). Without this, you lose connection-level visibility — a single HTTP/2 connection carries all RPCs, so connection metrics alone won't show per-RPC health.

Status codes vs HTTP codes

gRPC uses its own status codes (0-16), not HTTP status codes. HTTP/2 will return 200 OK even for gRPC errors — the actual error is in the grpc-status trailer header. Your monitoring must decode gRPC trailers, not just HTTP status codes.

Streaming RPCs change latency semantics

For unary RPCs, latency = time from request to response. For server-streaming RPCs, latency = time to first message + stream duration. These are fundamentally different SLOs. A slow server-streaming RPC that sends messages for 5 minutes isn't "slow" in the same way a 5-minute unary RPC is. Define separate SLOs for stream duration and time-to-first-message.

gRPC Status Codes Reference

Code	Name	Meaning	Alert?
0	OK	Success	—
1	CANCELLED	Client cancelled the call	If spike
2	UNKNOWN	Unexpected server error	⚠️ Yes
3	INVALID_ARGUMENT	Bad client request	Client bug
4	DEADLINE_EXCEEDED	Client timeout fired	⚠️ Yes
5	NOT_FOUND	Resource missing	Usually OK
8	RESOURCE_EXHAUSTED	Rate limit / overload	🚨 Critical
13	INTERNAL	Server-side panic or bug	🚨 Critical
14	UNAVAILABLE	Server unreachable	🚨 Critical
16	UNAUTHENTICATED	Auth failure	If spike

📡

Recommended

Monitor your gRPC service endpoints with Better Stack

Better Stack runs uptime checks on your gRPC health endpoints — with on-call alerting when services go UNAVAILABLE.

Try Better Stack Free →

Prometheus Metrics for gRPC (Go)

The go-grpc-prometheus library adds server and client interceptors that emit standard gRPC metrics:

// go get github.com/grpc-ecosystem/go-grpc-prometheus

package main

import (
    grpc_prometheus "github.com/grpc-ecosystem/go-grpc-prometheus"
    "google.golang.org/grpc"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "net/http"
)

func main() {
    // Server setup with Prometheus interceptors
    srv := grpc.NewServer(
        grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor),
        grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor),
    )

    // Register your gRPC services
    pb.RegisterMyServiceServer(srv, &MyService{})

    // Enable default metrics with histograms for latency
    grpc_prometheus.EnableHandlingTimeHistogram()

    // Initialize server metrics after registering services
    grpc_prometheus.Register(srv)

    // Expose Prometheus metrics on /metrics
    go func() {
        http.Handle("/metrics", promhttp.Handler())
        http.ListenAndServe(":9090", nil)
    }()

    // Serve gRPC on :50051
    lis, _ := net.Listen("tcp", ":50051")
    srv.Serve(lis)
}

// Client setup
conn, _ := grpc.Dial(
    "server:50051",
    grpc.WithUnaryInterceptor(grpc_prometheus.UnaryClientInterceptor),
    grpc.WithStreamInterceptor(grpc_prometheus.StreamClientInterceptor),
)

// This emits:
// grpc_server_handled_total{grpc_code, grpc_method, grpc_service, grpc_type}
// grpc_server_handling_seconds{grpc_method, grpc_service, grpc_type} histogram
// grpc_server_msg_received_total  — for streaming
// grpc_server_msg_sent_total

// Key alert rules:
// High error rate by status code:
// rate(grpc_server_handled_total{grpc_code!="OK"}[5m])
//   / rate(grpc_server_handled_total[5m]) > 0.01

// UNAVAILABLE spike (server down):
// increase(grpc_server_handled_total{grpc_code="UNAVAILABLE"}[1m]) > 10

// Latency regression:
// histogram_quantile(0.99, rate(grpc_server_handling_seconds_bucket[5m])) > 1.0

OpenTelemetry Distributed Tracing for gRPC

OpenTelemetry propagates trace context across gRPC calls automatically via metadata headers — even across language boundaries:

// Go: otelgrpc interceptors
// go get go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc

import "go.opentelemetry.io/contrib/instrumentation/google.golang.org/grpc/otelgrpc"

// Server
srv := grpc.NewServer(
    grpc.StatsHandler(otelgrpc.NewServerHandler()),
)

// Client
conn, _ := grpc.Dial(
    "downstream:50051",
    grpc.WithStatsHandler(otelgrpc.NewClientHandler()),
)

// The interceptors automatically:
// 1. Extract W3C Trace Context from incoming metadata (traceparent header)
// 2. Create a span for each RPC with rpc.method, rpc.service, rpc.grpc.status_code
// 3. Inject trace context into outgoing client calls
// 4. Record errors and status codes on spans

// Python: auto-instrumentation
# pip install opentelemetry-instrumentation-grpc
from opentelemetry.instrumentation.grpc import GrpcInstrumentorServer, GrpcInstrumentorClient

GrpcInstrumentorServer().instrument()   # instrument all grpc.server instances
GrpcInstrumentorClient().instrument()   # instrument all grpc.channel instances

# Java: OpenTelemetry agent (zero code change)
# java -javaagent:opentelemetry-javaagent.jar \
#   -Dotel.service.name=my-grpc-service \
#   -Dotel.exporter.otlp.endpoint=http://collector:4317 \
#   -jar my-service.jar

# What you see in traces:
# - Full RPC call tree across microservices
# - Exact method that failed (rpc.method="OrderService/PlaceOrder")
# - gRPC status code as span attribute
# - Time breakdown: client send, server process, server send, client receive

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your gRPC microservices goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your gRPC microservices + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

gRPC Health Protocol for Kubernetes

gRPC defines a standard health checking protocol (grpc.health.v1). Use grpc-health-probe for Kubernetes probes:

# Test health check manually
grpc-health-probe -addr=:50051

# Kubernetes deployment with gRPC health probes
spec:
  containers:
  - name: grpc-service
    livenessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:50051"]
      initialDelaySeconds: 10
      periodSeconds: 30
    readinessProbe:
      exec:
        command: ["/bin/grpc_health_probe", "-addr=:50051", "-service=MyService"]
      initialDelaySeconds: 5
      periodSeconds: 10

# Kubernetes 1.24+ supports native gRPC probes:
livenessProbe:
  grpc:
    port: 50051
    service: ""        # empty = overall server health

# Go: implement the health server
import "google.golang.org/grpc/health"
import healthpb "google.golang.org/grpc/health/grpc_health_v1"

healthServer := health.NewServer()
healthpb.RegisterHealthServer(srv, healthServer)

// Mark service healthy/unhealthy based on dependency checks
healthServer.SetServingStatus("MyService", healthpb.HealthCheckResponse_SERVING)
healthServer.SetServingStatus("MyService", healthpb.HealthCheckResponse_NOT_SERVING)

FAQ

What metrics should I monitor for gRPC services?

Core gRPC metrics: grpc_server_handled_total (by method and status code), grpc_server_handling_seconds histogram (latency), grpc_server_msg_received_total and grpc_server_msg_sent_total (streaming throughput), and grpc_client_handled_total on clients. Alert on UNAVAILABLE (14), INTERNAL (13), and DEADLINE_EXCEEDED (4) error rates. Monitor active stream count for server-streaming and bidirectional RPCs.

How do I add OpenTelemetry tracing to gRPC?

Use the OTel gRPC instrumentation library for your language. In Go: otelgrpc.NewServerHandler() and otelgrpc.NewClientHandler() as gRPC StatsHandlers. In Python: GrpcInstrumentorServer().instrument() auto-instruments all grpc.server instances. In Java: use the OpenTelemetry agent (zero code change). OTel propagates W3C Trace Context via gRPC metadata across service boundaries automatically.

How do gRPC status codes map to monitoring alerts?

Alert on: INTERNAL (13) and UNAVAILABLE (14) — server errors, alert at >1% rate. RESOURCE_EXHAUSTED (8) — rate limiting or overload, alert immediately. DEADLINE_EXCEEDED (4) — client timeout, may indicate slow server or tight client timeout config. UNKNOWN (2) — unexpected server error. CANCELLED (1) and NOT_FOUND (5) are usually normal application behavior, not infrastructure alerts.

How do I monitor gRPC streaming RPCs?

Track grpc_server_msg_received_total and grpc_server_msg_sent_total per method for streaming throughput. Monitor active stream count (monotonically increasing count indicates stream leaks). Define separate SLOs for time-to-first-message vs total stream duration. RST_STREAM frames indicate abnormal stream termination — monitor via Envoy access logs if using a service mesh.

How do I debug gRPC UNAVAILABLE errors?

UNAVAILABLE (14) means the server is not accepting connections. Check: (1) server readiness probe is passing, (2) TLS cert CN/SAN matches client dial address, (3) max_concurrent_streams limit — raise it or scale horizontally, (4) load balancer supports HTTP/2, (5) use grpc-health-probe to test manually: grpc-health-probe -addr=:50051. Implement exponential backoff retry on UNAVAILABLE on the client side.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

14-day free trial

Stop checking — get alerted instantly

Next time your gRPC microservices goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your gRPC microservices + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

gRPC Monitoring Guide: Metrics, Tracing & Observability (2026)

TL;DR — gRPC Monitoring Checklist

gRPC vs REST: Monitoring Differences

Binary protocol — no HTTP proxy visibility

Status codes vs HTTP codes

Streaming RPCs change latency semantics

gRPC Status Codes Reference

Prometheus Metrics for gRPC (Go)

OpenTelemetry Distributed Tracing for gRPC

Stop checking — get alerted instantly

gRPC Health Protocol for Kubernetes

FAQ

What metrics should I monitor for gRPC services?

How do I add OpenTelemetry tracing to gRPC?

How do gRPC status codes map to monitoring alerts?

How do I monitor gRPC streaming RPCs?

How do I debug gRPC UNAVAILABLE errors?

Related Guides

Microservices Monitoring Guide

Distributed Tracing Guide 2026

OpenTelemetry Guide 2026

Kubernetes Monitoring Guide

Stop checking — get alerted instantly