Go (Golang) Monitoring Guide: Metrics, Tracing & Alerts (2026)
Go's performance advantages disappear fast when goroutines leak, the garbage collector pauses at the wrong moment, or your heap grows unbounded. This guide covers how to instrument Go applications with Prometheus, detect problems with pprof, add tracing with OpenTelemetry, and alert before users notice.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
TL;DR — Go Monitoring Checklist
- ✅ Expose
/metricswith prometheus/client_golang — get runtime metrics for free - ✅ Track goroutine count (
runtime.NumGoroutine()) — alert on sustained growth - ✅ Enable
/debug/pproffor CPU/memory profiling (behind auth in prod) - ✅ Instrument HTTP handlers with request duration histogram (p50/p95/p99)
- ✅ Add OpenTelemetry tracing for distributed systems
- ✅ Monitor GC pause duration — alert when GC takes > 10ms
- ✅ Use
go-leakin tests to catch goroutine leaks before prod
Why Go Monitoring Has Unique Challenges
Go's concurrency model is one of its strengths — goroutines are cheap, so you spawn thousands of them. But that same model creates monitoring challenges you won't encounter in single-threaded languages:
Goroutine Leaks
Goroutines blocked forever on channels or context. Heap grows slowly until OOM. No crash, just gradual memory creep.
GC Pauses
Go's concurrent GC is fast but not free. Write pressure from many allocations can cause multi-millisecond pauses that show up as p99 latency spikes.
cgo Complications
cgo calls escape Go's scheduler. A blocking cgo call holds an OS thread, not just a goroutine — visible only as thread count growth.
Interface Allocation Overhead
Boxing values into interfaces causes heap escapes. In hot paths, this drives GC pressure. Visible in pprof allocs profile, not in basic metrics.
Prometheus Metrics with client_golang
prometheus/client_golang is the standard Prometheus client for Go. Importing it and calling promhttp.Handler() automatically registers 30+ Go runtime metrics (GC, goroutines, heap, threads).
// go.mod
// require github.com/prometheus/client_golang v1.19.0
package main
import (
"net/http"
"github.com/prometheus/client_golang/prometheus"
"github.com/prometheus/client_golang/prometheus/promhttp"
"github.com/prometheus/client_golang/prometheus/promauto"
)
var (
httpRequestsTotal = promauto.NewCounterVec(
prometheus.CounterOpts{
Name: "http_requests_total",
Help: "Total HTTP requests",
},
[]string{"method", "path", "status"},
)
httpRequestDuration = promauto.NewHistogramVec(
prometheus.HistogramOpts{
Name: "http_request_duration_seconds",
Help: "HTTP request latency",
Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
},
[]string{"method", "path"},
)
goroutineCount = promauto.NewGaugeFunc(
prometheus.GaugeOpts{
Name: "go_goroutine_count",
Help: "Current number of goroutines",
},
func() float64 { return float64(runtime.NumGoroutine()) },
)
)
func metricsMiddleware(next http.Handler) http.Handler {
return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
start := time.Now()
rw := &statusRecorder{ResponseWriter: w, status: 200}
next.ServeHTTP(rw, r)
duration := time.Since(start).Seconds()
httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(rw.status)).Inc()
httpRequestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
})
}
func main() {
mux := http.NewServeMux()
mux.Handle("/metrics", promhttp.Handler())
mux.Handle("/", metricsMiddleware(yourHandler))
http.ListenAndServe(":8080", mux)
}Auto-registered runtime metrics: Just importing client_golang gives you go_goroutines, go_gc_duration_seconds, go_memstats_heap_inuse_bytes, go_memstats_alloc_bytes_total, and 25+ more runtime metrics for free.
Monitor your Go services with Better Stack
Better Stack runs HTTP and TCP checks against your Go endpoints from 30+ global locations. Get paged in seconds when your Go service crashes or returns errors.
Try Better Stack Free →pprof — Built-in Go Profiler
Go ships with a built-in profiler. Import the blank identifier package to expose profiling endpoints over HTTP:
import _ "net/http/pprof"
// This registers /debug/pprof/* routes on the default ServeMux
// Available profiles:
// /debug/pprof/ — index
// /debug/pprof/goroutine — all goroutine stacks
// /debug/pprof/heap — heap memory allocations
// /debug/pprof/profile — 30s CPU profile
// /debug/pprof/allocs — memory allocation trace
// /debug/pprof/threadcreate — OS thread creation
// /debug/pprof/block — goroutine blocking
// Capture and analyze a heap profile:
go tool pprof http://localhost:6060/debug/pprof/heap
# Inside pprof interactive mode:
(pprof) top10 # top 10 allocating functions
(pprof) web # open flame graph in browser
(pprof) list main.Foo # show source-level allocation detail
# Capture a 30-second CPU profile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30
# Capture all goroutine stacks (great for leak diagnosis):
curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.txtSecurity: Never expose /debug/pprof on a public port. Mount it on a separate internal port (e.g., 6060) bound to localhost or behind authentication. The profiles expose function names, memory contents, and call graphs.
Go Runtime Metrics to Monitor
| Metric | client_golang Name | What to Watch | Alert Threshold |
|---|---|---|---|
| Goroutine count | go_goroutines | Sustained monotonic growth = goroutine leak | 10k+ or 50% growth over 1h |
| Heap in-use | go_memstats_heap_inuse_bytes | Memory actively holding objects | Near container memory limit |
| GC pause duration | go_gc_duration_seconds | STW pause latency (p99) | p99 > 10ms regularly |
| Allocs/sec | go_memstats_alloc_bytes_total | Rate of heap allocations → GC pressure | Sudden spike vs baseline |
| OS threads | go_threads | Too many = cgo blocking or scheduler pressure | > 10× GOMAXPROCS |
| GC cycles/sec | go_gc_duration_seconds (rate) | How often GC is running | > 10/min is high allocation pressure |
Detecting Goroutine Leaks
A goroutine leak is one of the most insidious Go bugs. Goroutines pile up silently, each holding stack memory, until the process OOMs or slows to a crawl.
In Production: Monitor NumGoroutine
// Prometheus alert rule for goroutine leak
groups:
- name: golang
rules:
- alert: GoroutineLeak
expr: |
rate(go_goroutines[30m]) > 10
for: 30m
labels:
severity: warning
annotations:
summary: "Goroutine count growing — possible goroutine leak"
description: "{{ $value }} goroutines/min growth rate"
- alert: GoroutinesHigh
expr: go_goroutines > 10000
for: 5m
labels:
severity: critical
annotations:
summary: "Goroutine count above 10,000"
# When this fires, capture a goroutine dump:
# curl http://internal-host:6060/debug/pprof/goroutine?debug=2 > dump.txt
# grep -c "goroutine" dump.txt # count
# grep -A 20 "goroutine 12345" # inspect specific goroutineIn Tests: go-leak
// go get go.uber.org/goleak
package mypackage_test
import (
"testing"
"go.uber.org/goleak"
)
func TestMain(m *testing.M) {
// Assert no goroutines leak after tests complete
goleak.VerifyTestMain(m)
}
func TestMyHandler(t *testing.T) {
defer goleak.VerifyNone(t) // Check at end of this specific test
// ... your test code
}Common Goroutine Leak Patterns
❌ Unbuffered channel with no reader
go func() { ch <- result }() // leaks if nothing reads chFix: use a buffered channel or ensure a reader always exists
❌ context.Background() never cancelled
go func() { <-ctx.Done() }() // leaks if ctx never cancelsFix: always use a cancellable context; defer cancel() immediately after WithCancel
❌ HTTP response body not closed
resp, _ := http.Get(url) // goroutine leaks if body not read/closedFix: always defer resp.Body.Close() after checking err
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Go services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Go services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Distributed Tracing with OpenTelemetry
For microservices, Prometheus metrics tell you what is slow. OpenTelemetry traces tell you where in the call chain it's slow.
// go get go.opentelemetry.io/otel
// go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp
package main
import (
"go.opentelemetry.io/otel"
"go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
"go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
sdktrace "go.opentelemetry.io/otel/sdk/trace"
)
func initTracer(ctx context.Context) func() {
exporter, _ := otlptracehttp.New(ctx,
otlptracehttp.WithEndpoint("otel-collector:4318"),
otlptracehttp.WithInsecure(),
)
tp := sdktrace.NewTracerProvider(
sdktrace.WithBatcher(exporter),
sdktrace.WithSampler(sdktrace.AlwaysSample()), // 100% in dev; use ParentBased in prod
)
otel.SetTracerProvider(tp)
return func() { tp.Shutdown(ctx) }
}
// Auto-instrument HTTP server:
mux.Handle("/api/", otelhttp.NewHandler(yourHandler, "api"))
// Manual span for a critical operation:
tracer := otel.Tracer("myapp")
ctx, span := tracer.Start(ctx, "database.Query")
defer span.End()
span.SetAttributes(attribute.String("db.query", query))
rows, err := db.QueryContext(ctx, query)
if err != nil {
span.RecordError(err)
span.SetStatus(codes.Error, err.Error())
}Go Application Alert Rules
groups:
- name: go-app
rules:
# Service is down
- alert: GoServiceDown
expr: up{job="myapp"} == 0
for: 1m
labels: { severity: critical }
annotations:
summary: "Go service {{ $labels.instance }} is down"
# High error rate
- alert: GoHighErrorRate
expr: |
rate(http_requests_total{status=~"5.."}[5m]) /
rate(http_requests_total[5m]) > 0.01
for: 5m
labels: { severity: critical }
annotations:
summary: "HTTP error rate > 1%"
# Slow p95 latency
- alert: GoSlowP95Latency
expr: |
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
for: 5m
labels: { severity: warning }
annotations:
summary: "p95 request latency > 2 seconds"
# GC taking too long
- alert: GoGCPauseHigh
expr: |
histogram_quantile(0.99, rate(go_gc_duration_seconds_bucket[5m])) > 0.01
for: 5m
labels: { severity: warning }
annotations:
summary: "GC p99 pause > 10ms — check allocation rate"
# Goroutine leak
- alert: GoGoroutineLeak
expr: go_goroutines > 10000
for: 5m
labels: { severity: critical }
annotations:
summary: "Goroutine count > 10,000 — investigate for leaks"
# Memory pressure
- alert: GoHighMemory
expr: process_resident_memory_bytes > 1e9 # 1GB
for: 5m
labels: { severity: warning }
annotations:
summary: "Go process using > 1GB RSS"Go Monitoring & APM Tools (2026)
| Tool | Type | Go Support | Cost |
|---|---|---|---|
| Prometheus + Grafana | Self-hosted metrics | Excellent — prometheus/client_golang is first-class | Free (OSS) |
| Better Stack | SaaS uptime + logs | HTTP/TCP checks + structured log ingestion | Free tier, $25/mo+ |
| Grafana Cloud | SaaS observability | Full OTEL support, Pyroscope for Go pprof | Free tier, $8/mo+ |
| Datadog | Enterprise APM | dd-trace-go with auto-instrumentation, profiler | $15-23/host/mo |
| Sentry | Error tracking | sentry-go SDK, panic recovery, breadcrumbs | Free (5K errors/mo), $26/mo+ |
| Pyroscope | Continuous profiling | Always-on pprof sampling, flame graphs in prod | Free (OSS), Grafana Cloud hosted |
Frequently Asked Questions
How do I add Prometheus metrics to a Go application?
Import prometheus/client_golang. Use promauto to register Counters, Gauges, and Histograms. Expose them via promhttp.Handler() on /metrics. Auto-instrumented runtime metrics (goroutines, GC, heap) are registered automatically. Add HTTP middleware to track request count and duration per route.
How do I detect goroutine leaks in Go?
In production: expose go_goroutines as a Prometheus metric and alert when it grows monotonically over 30 minutes. When it fires, capture a goroutine dump via /debug/pprof/goroutine?debug=2 and look for blocked goroutines. In tests: use goleak (go.uber.org/goleak) to assert no goroutines leak after each test.
What is pprof and how do I use it for Go profiling?
pprof is Go's built-in profiler. Import _ "net/http/pprof" to expose /debug/pprof/* endpoints. Use go tool pprof http://host:6060/debug/pprof/heap to analyze memory allocations. Use /profile for CPU profiling and /goroutine for goroutine stack dumps. Keep pprof on an internal port behind authentication.
How do I add distributed tracing to Go?
Use OpenTelemetry Go SDK. Initialize a TracerProvider with an OTLP exporter pointing to your collector (Jaeger, Tempo). Use otelhttp.NewHandler() for HTTP middleware auto-instrumentation. Create manual spans with tracer.Start(ctx, "name") for critical code paths. Pass context through your call chain.
What Go runtime metrics should I monitor?
Key metrics from client_golang: go_goroutines (leak indicator), go_memstats_heap_inuse_bytes (memory pressure), go_gc_duration_seconds (GC pause latency), go_memstats_alloc_bytes_total rate (allocation rate / GC pressure), go_threads (OS thread count). All are auto-registered when you use promhttp.Handler().
Related Monitoring Guides
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”