Application MonitoringGolang2026 Guide

Go (Golang) Monitoring Guide: Metrics, Tracing & Alerts (2026)

Q: How do I add Prometheus metrics to a Go application?

Import prometheus/client_golang. Register metrics (Counter, Gauge, Histogram) in an init() function or package var. Expose them on /metrics using promhttp.Handler(). For HTTP servers, use promhttp middleware to auto-instrument all routes. The most important metrics to expose: request count (Counter), request duration (Histogram with latency buckets), in-flight requests (Gauge), error rate (Counter with status label).

Q: How do I detect goroutine leaks in Go?

Monitor runtime.NumGoroutine() — expose it as a Prometheus gauge and alert on sustained growth. A healthy service has a stable goroutine count; a leak shows as a slow monotonic increase. For debugging: enable pprof (/debug/pprof/goroutine) and capture the goroutine dump. Tools like goleak (for tests) and go-leak can automate leak detection. Common causes: goroutines blocking on channels with no sender, goroutines waiting on context.Background() that never cancels, goroutines spawned in request handlers without cleanup.

Q: What is pprof and how do I use it for Go profiling?

pprof is Go's built-in profiling tool. Import _ "net/http/pprof" to expose profiling endpoints. Then use go tool pprof to analyze: cpu profile (30s CPU sample), heap (memory allocations), goroutine (all goroutine stacks), allocs (allocation trace). Command: go tool pprof http://localhost:6060/debug/pprof/heap. pprof flame graphs show which functions are using the most CPU or allocating the most memory.

Q: How do I add distributed tracing to Go?

Use OpenTelemetry Go SDK (go.opentelemetry.io/otel). Initialize a TracerProvider pointing to your collector (Jaeger, Tempo, etc.). Use otelhttp middleware for HTTP servers or otelgrpc for gRPC. Create spans manually for critical code paths: tracer.Start(ctx, "operation-name"). Pass context through your call chain — spans are linked automatically through context propagation. Export to any OTLP-compatible backend.

Q: What Go runtime metrics should I monitor?

The key Go runtime metrics: (1) Goroutine count (runtime.NumGoroutine) — leak indicator, (2) Heap in-use vs heap allocated — memory pressure, (3) GC pause duration and frequency — latency impact, (4) Number of OS threads (GOMAXPROCS vs runtime.NumThread), (5) GC CPU fraction — how much CPU the garbage collector consumes. client_golang exposes all of these as process_* and go_* metrics automatically when you call promhttp.Handler().

Go's performance advantages disappear fast when goroutines leak, the garbage collector pauses at the wrong moment, or your heap grows unbounded. This guide covers how to instrument Go applications with Prometheus, detect problems with pprof, add tracing with OpenTelemetry, and alert before users notice.

Updated April 2026•13 min read•Backend Engineering / SRE

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Go Monitoring Checklist

✅ Expose /metrics with prometheus/client_golang — get runtime metrics for free
✅ Track goroutine count (runtime.NumGoroutine()) — alert on sustained growth
✅ Enable /debug/pprof for CPU/memory profiling (behind auth in prod)
✅ Instrument HTTP handlers with request duration histogram (p50/p95/p99)
✅ Add OpenTelemetry tracing for distributed systems
✅ Monitor GC pause duration — alert when GC takes > 10ms
✅ Use go-leak in tests to catch goroutine leaks before prod

Why Go Monitoring Has Unique Challenges

Go's concurrency model is one of its strengths — goroutines are cheap, so you spawn thousands of them. But that same model creates monitoring challenges you won't encounter in single-threaded languages:

Goroutine Leaks

Goroutines blocked forever on channels or context. Heap grows slowly until OOM. No crash, just gradual memory creep.

GC Pauses

Go's concurrent GC is fast but not free. Write pressure from many allocations can cause multi-millisecond pauses that show up as p99 latency spikes.

cgo Complications

cgo calls escape Go's scheduler. A blocking cgo call holds an OS thread, not just a goroutine — visible only as thread count growth.

Interface Allocation Overhead

Boxing values into interfaces causes heap escapes. In hot paths, this drives GC pressure. Visible in pprof allocs profile, not in basic metrics.

Prometheus Metrics with client_golang

prometheus/client_golang is the standard Prometheus client for Go. Importing it and calling promhttp.Handler() automatically registers 30+ Go runtime metrics (GC, goroutines, heap, threads).

// go.mod
// require github.com/prometheus/client_golang v1.19.0

package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    httpRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    httpRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request latency",
            Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
        []string{"method", "path"},
    )

    goroutineCount = promauto.NewGaugeFunc(
        prometheus.GaugeOpts{
            Name: "go_goroutine_count",
            Help: "Current number of goroutines",
        },
        func() float64 { return float64(runtime.NumGoroutine()) },
    )
)

func metricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        rw := &statusRecorder{ResponseWriter: w, status: 200}
        next.ServeHTTP(rw, r)
        duration := time.Since(start).Seconds()
        httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(rw.status)).Inc()
        httpRequestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
    })
}

func main() {
    mux := http.NewServeMux()
    mux.Handle("/metrics", promhttp.Handler())
    mux.Handle("/", metricsMiddleware(yourHandler))
    http.ListenAndServe(":8080", mux)
}

Auto-registered runtime metrics: Just importing client_golang gives you go_goroutines, go_gc_duration_seconds, go_memstats_heap_inuse_bytes, go_memstats_alloc_bytes_total, and 25+ more runtime metrics for free.

📡

Recommended

Monitor your Go services with Better Stack

Better Stack runs HTTP and TCP checks against your Go endpoints from 30+ global locations. Get paged in seconds when your Go service crashes or returns errors.

Try Better Stack Free →

pprof — Built-in Go Profiler

Go ships with a built-in profiler. Import the blank identifier package to expose profiling endpoints over HTTP:

import _ "net/http/pprof"
// This registers /debug/pprof/* routes on the default ServeMux

// Available profiles:
// /debug/pprof/           — index
// /debug/pprof/goroutine  — all goroutine stacks
// /debug/pprof/heap       — heap memory allocations
// /debug/pprof/profile    — 30s CPU profile
// /debug/pprof/allocs     — memory allocation trace
// /debug/pprof/threadcreate — OS thread creation
// /debug/pprof/block      — goroutine blocking

// Capture and analyze a heap profile:
go tool pprof http://localhost:6060/debug/pprof/heap

# Inside pprof interactive mode:
(pprof) top10          # top 10 allocating functions
(pprof) web            # open flame graph in browser
(pprof) list main.Foo  # show source-level allocation detail

# Capture a 30-second CPU profile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Capture all goroutine stacks (great for leak diagnosis):
curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.txt

Security: Never expose /debug/pprof on a public port. Mount it on a separate internal port (e.g., 6060) bound to localhost or behind authentication. The profiles expose function names, memory contents, and call graphs.

Go Runtime Metrics to Monitor

Metric	client_golang Name	What to Watch	Alert Threshold
Goroutine count	go_goroutines	Sustained monotonic growth = goroutine leak	10k+ or 50% growth over 1h
Heap in-use	go_memstats_heap_inuse_bytes	Memory actively holding objects	Near container memory limit
GC pause duration	go_gc_duration_seconds	STW pause latency (p99)	p99 > 10ms regularly
Allocs/sec	go_memstats_alloc_bytes_total	Rate of heap allocations → GC pressure	Sudden spike vs baseline
OS threads	go_threads	Too many = cgo blocking or scheduler pressure	> 10× GOMAXPROCS
GC cycles/sec	go_gc_duration_seconds (rate)	How often GC is running	> 10/min is high allocation pressure

Detecting Goroutine Leaks

A goroutine leak is one of the most insidious Go bugs. Goroutines pile up silently, each holding stack memory, until the process OOMs or slows to a crawl.

In Production: Monitor NumGoroutine

// Prometheus alert rule for goroutine leak
groups:
  - name: golang
    rules:
      - alert: GoroutineLeak
        expr: |
          rate(go_goroutines[30m]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Goroutine count growing — possible goroutine leak"
          description: "{{ $value }} goroutines/min growth rate"

      - alert: GoroutinesHigh
        expr: go_goroutines > 10000
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Goroutine count above 10,000"

# When this fires, capture a goroutine dump:
# curl http://internal-host:6060/debug/pprof/goroutine?debug=2 > dump.txt
# grep -c "goroutine" dump.txt  # count
# grep -A 20 "goroutine 12345"  # inspect specific goroutine

In Tests: go-leak

// go get go.uber.org/goleak
package mypackage_test

import (
    "testing"
    "go.uber.org/goleak"
)

func TestMain(m *testing.M) {
    // Assert no goroutines leak after tests complete
    goleak.VerifyTestMain(m)
}

func TestMyHandler(t *testing.T) {
    defer goleak.VerifyNone(t)  // Check at end of this specific test
    // ... your test code
}

Common Goroutine Leak Patterns

❌ Unbuffered channel with no reader

go func() { ch <- result }() // leaks if nothing reads ch

Fix: use a buffered channel or ensure a reader always exists

❌ context.Background() never cancelled

go func() { <-ctx.Done() }() // leaks if ctx never cancels

Fix: always use a cancellable context; defer cancel() immediately after WithCancel

❌ HTTP response body not closed

resp, _ := http.Get(url) // goroutine leaks if body not read/closed

Fix: always defer resp.Body.Close() after checking err

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Go services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Go services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Distributed Tracing with OpenTelemetry

For microservices, Prometheus metrics tell you what is slow. OpenTelemetry traces tell you where in the call chain it's slow.

// go get go.opentelemetry.io/otel
// go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp

package main

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer(ctx context.Context) func() {
    exporter, _ := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint("otel-collector:4318"),
        otlptracehttp.WithInsecure(),
    )
    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithSampler(sdktrace.AlwaysSample()), // 100% in dev; use ParentBased in prod
    )
    otel.SetTracerProvider(tp)
    return func() { tp.Shutdown(ctx) }
}

// Auto-instrument HTTP server:
mux.Handle("/api/", otelhttp.NewHandler(yourHandler, "api"))

// Manual span for a critical operation:
tracer := otel.Tracer("myapp")
ctx, span := tracer.Start(ctx, "database.Query")
defer span.End()
span.SetAttributes(attribute.String("db.query", query))
rows, err := db.QueryContext(ctx, query)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}

Go Application Alert Rules

groups:
  - name: go-app
    rules:
      # Service is down
      - alert: GoServiceDown
        expr: up{job="myapp"} == 0
        for: 1m
        labels: { severity: critical }
        annotations:
          summary: "Go service {{ $labels.instance }} is down"

      # High error rate
      - alert: GoHighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) /
          rate(http_requests_total[5m]) > 0.01
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "HTTP error rate > 1%"

      # Slow p95 latency
      - alert: GoSlowP95Latency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "p95 request latency > 2 seconds"

      # GC taking too long
      - alert: GoGCPauseHigh
        expr: |
          histogram_quantile(0.99, rate(go_gc_duration_seconds_bucket[5m])) > 0.01
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "GC p99 pause > 10ms — check allocation rate"

      # Goroutine leak
      - alert: GoGoroutineLeak
        expr: go_goroutines > 10000
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "Goroutine count > 10,000 — investigate for leaks"

      # Memory pressure
      - alert: GoHighMemory
        expr: process_resident_memory_bytes > 1e9  # 1GB
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Go process using > 1GB RSS"

Go Monitoring & APM Tools (2026)

Tool	Type	Go Support	Cost
Prometheus + Grafana	Self-hosted metrics	Excellent — prometheus/client_golang is first-class	Free (OSS)
Better Stack	SaaS uptime + logs	HTTP/TCP checks + structured log ingestion	Free tier, $25/mo+
Grafana Cloud	SaaS observability	Full OTEL support, Pyroscope for Go pprof	Free tier, $8/mo+
Datadog	Enterprise APM	dd-trace-go with auto-instrumentation, profiler	$15-23/host/mo
Sentry	Error tracking	sentry-go SDK, panic recovery, breadcrumbs	Free (5K errors/mo), $26/mo+
Pyroscope	Continuous profiling	Always-on pprof sampling, flame graphs in prod	Free (OSS), Grafana Cloud hosted

Frequently Asked Questions

How do I add Prometheus metrics to a Go application?

Import prometheus/client_golang. Use promauto to register Counters, Gauges, and Histograms. Expose them via promhttp.Handler() on /metrics. Auto-instrumented runtime metrics (goroutines, GC, heap) are registered automatically. Add HTTP middleware to track request count and duration per route.

How do I detect goroutine leaks in Go?

In production: expose go_goroutines as a Prometheus metric and alert when it grows monotonically over 30 minutes. When it fires, capture a goroutine dump via /debug/pprof/goroutine?debug=2 and look for blocked goroutines. In tests: use goleak (go.uber.org/goleak) to assert no goroutines leak after each test.

What is pprof and how do I use it for Go profiling?

pprof is Go's built-in profiler. Import _ "net/http/pprof" to expose /debug/pprof/* endpoints. Use go tool pprof http://host:6060/debug/pprof/heap to analyze memory allocations. Use /profile for CPU profiling and /goroutine for goroutine stack dumps. Keep pprof on an internal port behind authentication.

How do I add distributed tracing to Go?

Use OpenTelemetry Go SDK. Initialize a TracerProvider with an OTLP exporter pointing to your collector (Jaeger, Tempo). Use otelhttp.NewHandler() for HTTP middleware auto-instrumentation. Create manual spans with tracer.Start(ctx, "name") for critical code paths. Pass context through your call chain.

What Go runtime metrics should I monitor?

Key metrics from client_golang: go_goroutines (leak indicator), go_memstats_heap_inuse_bytes (memory pressure), go_gc_duration_seconds (GC pause latency), go_memstats_alloc_bytes_total rate (allocation rate / GC pressure), go_threads (OS thread count). All are auto-registered when you use promhttp.Handler().

Related Monitoring Guides

→ Node.js Monitoring Guide → Python Monitoring Guide → Microservices Monitoring Guide → Distributed Tracing Guide → OpenTelemetry Setup Guide → Best APM Tools 2026

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you