Application MonitoringGolang2026 Guide

Go (Golang) Monitoring Guide: Metrics, Tracing & Alerts (2026)

Go's performance advantages disappear fast when goroutines leak, the garbage collector pauses at the wrong moment, or your heap grows unbounded. This guide covers how to instrument Go applications with Prometheus, detect problems with pprof, add tracing with OpenTelemetry, and alert before users notice.

Updated April 202613 min readBackend Engineering / SRE
Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Go Monitoring Checklist

  • ✅ Expose /metrics with prometheus/client_golang — get runtime metrics for free
  • ✅ Track goroutine count (runtime.NumGoroutine()) — alert on sustained growth
  • ✅ Enable /debug/pprof for CPU/memory profiling (behind auth in prod)
  • ✅ Instrument HTTP handlers with request duration histogram (p50/p95/p99)
  • ✅ Add OpenTelemetry tracing for distributed systems
  • ✅ Monitor GC pause duration — alert when GC takes > 10ms
  • ✅ Use go-leak in tests to catch goroutine leaks before prod

Why Go Monitoring Has Unique Challenges

Go's concurrency model is one of its strengths — goroutines are cheap, so you spawn thousands of them. But that same model creates monitoring challenges you won't encounter in single-threaded languages:

Goroutine Leaks

Goroutines blocked forever on channels or context. Heap grows slowly until OOM. No crash, just gradual memory creep.

GC Pauses

Go's concurrent GC is fast but not free. Write pressure from many allocations can cause multi-millisecond pauses that show up as p99 latency spikes.

cgo Complications

cgo calls escape Go's scheduler. A blocking cgo call holds an OS thread, not just a goroutine — visible only as thread count growth.

Interface Allocation Overhead

Boxing values into interfaces causes heap escapes. In hot paths, this drives GC pressure. Visible in pprof allocs profile, not in basic metrics.

Prometheus Metrics with client_golang

prometheus/client_golang is the standard Prometheus client for Go. Importing it and calling promhttp.Handler() automatically registers 30+ Go runtime metrics (GC, goroutines, heap, threads).

// go.mod
// require github.com/prometheus/client_golang v1.19.0

package main

import (
    "net/http"
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "github.com/prometheus/client_golang/prometheus/promauto"
)

var (
    httpRequestsTotal = promauto.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total HTTP requests",
        },
        []string{"method", "path", "status"},
    )

    httpRequestDuration = promauto.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request latency",
            Buckets: []float64{.005, .01, .025, .05, .1, .25, .5, 1, 2.5, 5},
        },
        []string{"method", "path"},
    )

    goroutineCount = promauto.NewGaugeFunc(
        prometheus.GaugeOpts{
            Name: "go_goroutine_count",
            Help: "Current number of goroutines",
        },
        func() float64 { return float64(runtime.NumGoroutine()) },
    )
)

func metricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        start := time.Now()
        rw := &statusRecorder{ResponseWriter: w, status: 200}
        next.ServeHTTP(rw, r)
        duration := time.Since(start).Seconds()
        httpRequestsTotal.WithLabelValues(r.Method, r.URL.Path, strconv.Itoa(rw.status)).Inc()
        httpRequestDuration.WithLabelValues(r.Method, r.URL.Path).Observe(duration)
    })
}

func main() {
    mux := http.NewServeMux()
    mux.Handle("/metrics", promhttp.Handler())
    mux.Handle("/", metricsMiddleware(yourHandler))
    http.ListenAndServe(":8080", mux)
}

Auto-registered runtime metrics: Just importing client_golang gives you go_goroutines, go_gc_duration_seconds, go_memstats_heap_inuse_bytes, go_memstats_alloc_bytes_total, and 25+ more runtime metrics for free.

📡
Recommended

Monitor your Go services with Better Stack

Better Stack runs HTTP and TCP checks against your Go endpoints from 30+ global locations. Get paged in seconds when your Go service crashes or returns errors.

Try Better Stack Free →

pprof — Built-in Go Profiler

Go ships with a built-in profiler. Import the blank identifier package to expose profiling endpoints over HTTP:

import _ "net/http/pprof"
// This registers /debug/pprof/* routes on the default ServeMux

// Available profiles:
// /debug/pprof/           — index
// /debug/pprof/goroutine  — all goroutine stacks
// /debug/pprof/heap       — heap memory allocations
// /debug/pprof/profile    — 30s CPU profile
// /debug/pprof/allocs     — memory allocation trace
// /debug/pprof/threadcreate — OS thread creation
// /debug/pprof/block      — goroutine blocking

// Capture and analyze a heap profile:
go tool pprof http://localhost:6060/debug/pprof/heap

# Inside pprof interactive mode:
(pprof) top10          # top 10 allocating functions
(pprof) web            # open flame graph in browser
(pprof) list main.Foo  # show source-level allocation detail

# Capture a 30-second CPU profile:
go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30

# Capture all goroutine stacks (great for leak diagnosis):
curl http://localhost:6060/debug/pprof/goroutine?debug=2 > goroutines.txt

Security: Never expose /debug/pprof on a public port. Mount it on a separate internal port (e.g., 6060) bound to localhost or behind authentication. The profiles expose function names, memory contents, and call graphs.

Go Runtime Metrics to Monitor

Metricclient_golang NameWhat to WatchAlert Threshold
Goroutine countgo_goroutinesSustained monotonic growth = goroutine leak10k+ or 50% growth over 1h
Heap in-usego_memstats_heap_inuse_bytesMemory actively holding objectsNear container memory limit
GC pause durationgo_gc_duration_secondsSTW pause latency (p99)p99 > 10ms regularly
Allocs/secgo_memstats_alloc_bytes_totalRate of heap allocations → GC pressureSudden spike vs baseline
OS threadsgo_threadsToo many = cgo blocking or scheduler pressure> 10× GOMAXPROCS
GC cycles/secgo_gc_duration_seconds (rate)How often GC is running> 10/min is high allocation pressure

Detecting Goroutine Leaks

A goroutine leak is one of the most insidious Go bugs. Goroutines pile up silently, each holding stack memory, until the process OOMs or slows to a crawl.

In Production: Monitor NumGoroutine

// Prometheus alert rule for goroutine leak
groups:
  - name: golang
    rules:
      - alert: GoroutineLeak
        expr: |
          rate(go_goroutines[30m]) > 10
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Goroutine count growing — possible goroutine leak"
          description: "{{ $value }} goroutines/min growth rate"

      - alert: GoroutinesHigh
        expr: go_goroutines > 10000
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Goroutine count above 10,000"

# When this fires, capture a goroutine dump:
# curl http://internal-host:6060/debug/pprof/goroutine?debug=2 > dump.txt
# grep -c "goroutine" dump.txt  # count
# grep -A 20 "goroutine 12345"  # inspect specific goroutine

In Tests: go-leak

// go get go.uber.org/goleak
package mypackage_test

import (
    "testing"
    "go.uber.org/goleak"
)

func TestMain(m *testing.M) {
    // Assert no goroutines leak after tests complete
    goleak.VerifyTestMain(m)
}

func TestMyHandler(t *testing.T) {
    defer goleak.VerifyNone(t)  // Check at end of this specific test
    // ... your test code
}

Common Goroutine Leak Patterns

❌ Unbuffered channel with no reader

go func() { ch <- result }() // leaks if nothing reads ch

Fix: use a buffered channel or ensure a reader always exists

❌ context.Background() never cancelled

go func() { <-ctx.Done() }() // leaks if ctx never cancels

Fix: always use a cancellable context; defer cancel() immediately after WithCancel

❌ HTTP response body not closed

resp, _ := http.Get(url) // goroutine leaks if body not read/closed

Fix: always defer resp.Body.Close() after checking err

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Go services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your Go services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Distributed Tracing with OpenTelemetry

For microservices, Prometheus metrics tell you what is slow. OpenTelemetry traces tell you where in the call chain it's slow.

// go get go.opentelemetry.io/otel
// go get go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp

package main

import (
    "go.opentelemetry.io/otel"
    "go.opentelemetry.io/otel/exporters/otlp/otlptrace/otlptracehttp"
    "go.opentelemetry.io/contrib/instrumentation/net/http/otelhttp"
    sdktrace "go.opentelemetry.io/otel/sdk/trace"
)

func initTracer(ctx context.Context) func() {
    exporter, _ := otlptracehttp.New(ctx,
        otlptracehttp.WithEndpoint("otel-collector:4318"),
        otlptracehttp.WithInsecure(),
    )
    tp := sdktrace.NewTracerProvider(
        sdktrace.WithBatcher(exporter),
        sdktrace.WithSampler(sdktrace.AlwaysSample()), // 100% in dev; use ParentBased in prod
    )
    otel.SetTracerProvider(tp)
    return func() { tp.Shutdown(ctx) }
}

// Auto-instrument HTTP server:
mux.Handle("/api/", otelhttp.NewHandler(yourHandler, "api"))

// Manual span for a critical operation:
tracer := otel.Tracer("myapp")
ctx, span := tracer.Start(ctx, "database.Query")
defer span.End()
span.SetAttributes(attribute.String("db.query", query))
rows, err := db.QueryContext(ctx, query)
if err != nil {
    span.RecordError(err)
    span.SetStatus(codes.Error, err.Error())
}

Go Application Alert Rules

groups:
  - name: go-app
    rules:
      # Service is down
      - alert: GoServiceDown
        expr: up{job="myapp"} == 0
        for: 1m
        labels: { severity: critical }
        annotations:
          summary: "Go service {{ $labels.instance }} is down"

      # High error rate
      - alert: GoHighErrorRate
        expr: |
          rate(http_requests_total{status=~"5.."}[5m]) /
          rate(http_requests_total[5m]) > 0.01
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "HTTP error rate > 1%"

      # Slow p95 latency
      - alert: GoSlowP95Latency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 2
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "p95 request latency > 2 seconds"

      # GC taking too long
      - alert: GoGCPauseHigh
        expr: |
          histogram_quantile(0.99, rate(go_gc_duration_seconds_bucket[5m])) > 0.01
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "GC p99 pause > 10ms — check allocation rate"

      # Goroutine leak
      - alert: GoGoroutineLeak
        expr: go_goroutines > 10000
        for: 5m
        labels: { severity: critical }
        annotations:
          summary: "Goroutine count > 10,000 — investigate for leaks"

      # Memory pressure
      - alert: GoHighMemory
        expr: process_resident_memory_bytes > 1e9  # 1GB
        for: 5m
        labels: { severity: warning }
        annotations:
          summary: "Go process using > 1GB RSS"

Go Monitoring & APM Tools (2026)

ToolTypeGo SupportCost
Prometheus + GrafanaSelf-hosted metricsExcellent — prometheus/client_golang is first-classFree (OSS)
Better StackSaaS uptime + logsHTTP/TCP checks + structured log ingestionFree tier, $25/mo+
Grafana CloudSaaS observabilityFull OTEL support, Pyroscope for Go pprofFree tier, $8/mo+
DatadogEnterprise APMdd-trace-go with auto-instrumentation, profiler$15-23/host/mo
SentryError trackingsentry-go SDK, panic recovery, breadcrumbsFree (5K errors/mo), $26/mo+
PyroscopeContinuous profilingAlways-on pprof sampling, flame graphs in prodFree (OSS), Grafana Cloud hosted

Frequently Asked Questions

How do I add Prometheus metrics to a Go application?

Import prometheus/client_golang. Use promauto to register Counters, Gauges, and Histograms. Expose them via promhttp.Handler() on /metrics. Auto-instrumented runtime metrics (goroutines, GC, heap) are registered automatically. Add HTTP middleware to track request count and duration per route.

How do I detect goroutine leaks in Go?

In production: expose go_goroutines as a Prometheus metric and alert when it grows monotonically over 30 minutes. When it fires, capture a goroutine dump via /debug/pprof/goroutine?debug=2 and look for blocked goroutines. In tests: use goleak (go.uber.org/goleak) to assert no goroutines leak after each test.

What is pprof and how do I use it for Go profiling?

pprof is Go's built-in profiler. Import _ "net/http/pprof" to expose /debug/pprof/* endpoints. Use go tool pprof http://host:6060/debug/pprof/heap to analyze memory allocations. Use /profile for CPU profiling and /goroutine for goroutine stack dumps. Keep pprof on an internal port behind authentication.

How do I add distributed tracing to Go?

Use OpenTelemetry Go SDK. Initialize a TracerProvider with an OTLP exporter pointing to your collector (Jaeger, Tempo). Use otelhttp.NewHandler() for HTTP middleware auto-instrumentation. Create manual spans with tracer.Start(ctx, "name") for critical code paths. Pass context through your call chain.

What Go runtime metrics should I monitor?

Key metrics from client_golang: go_goroutines (leak indicator), go_memstats_heap_inuse_bytes (memory pressure), go_gc_duration_seconds (GC pause latency), go_memstats_alloc_bytes_total rate (allocation rate / GC pressure), go_threads (OS thread count). All are auto-registered when you use promhttp.Handler().

Related Monitoring Guides

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you