Node.jsAPM2026 Guide

Node.js Monitoring Guide: Performance Metrics, APM & Alerts (2026)

Node.js has unique failure modes — the event loop, heap memory leaks, and garbage collection pauses — that generic APM tools often miss. This guide covers what to monitor, how to instrument with OpenTelemetry, and which tools give the deepest Node.js visibility.

Updated April 202613 min readNode.js / APM
Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Node.js Monitoring Checklist

  • ✅ Track event loop lag — alert if > 100ms sustained
  • ✅ Watch heap used over time — monotonic growth = memory leak
  • ✅ Monitor GC pause duration — high GC pressure signals heap pressure
  • ✅ Use prom-client or OpenTelemetry to expose metrics
  • ✅ Add an external uptime check — catch crashes Node.js won't log
  • ✅ Set alerts on p95 response time, error rate, and process restart count

Why Node.js Is Different to Monitor

Most languages run on multiple threads — a slow request blocks one thread, others keep serving. Node.js uses a single-threaded event loop. One blocked callback blocks every request. This creates failure modes you won't see in Java or Go apps:

Node.js-specific problems

  • • Event loop blocking (synchronous CPU work)
  • • V8 heap memory leaks (closures, caches)
  • • GC pauses causing latency spikes
  • • Callback hell / unhandled promise rejections
  • • Max heap limit crashes (OOM)

Standard metrics (still needed)

  • • HTTP request rate and latency
  • • Error rate and 5xx breakdown
  • • CPU usage (user vs system)
  • • Database query latency
  • • External API call success rate

Core Node.js Metrics Reference

MetricAPIAlert Threshold
Event loop lagperf_hooks / clinic.js> 100ms sustained (warn), > 500ms (critical)
heapUsedprocess.memoryUsage()Growing trend over 30m; > 80% of --max-old-space-size
heapTotalprocess.memoryUsage()Tracks V8 allocated heap (watch heapUsed/heapTotal ratio)
externalprocess.memoryUsage()C++ objects + Buffers; high value = Buffer leak
GC durationperf_hooks PerformanceObserverMajor GC > 100ms; frequent GC = heap pressure
Active handlesprocess._getActiveHandles()Growing handle count = resource leak (open sockets, timers)
CPU usageprocess.cpuUsage()> 80% user CPU sustained = event loop blocking risk
📡
Recommended

Monitor your Node.js API endpoints with Better Stack

Better Stack runs uptime checks on your Node.js APIs from 30+ global locations. Catch crashes and slowdowns before your users do.

Try Better Stack Free →

Instrumenting Node.js with prom-client

prom-client is the most popular Node.js Prometheus client. It automatically collects default Node.js metrics (heap, GC, event loop) and lets you define custom business metrics.

# Install
npm install prom-client

# src/metrics.ts
import { Registry, collectDefaultMetrics, Histogram, Counter, Gauge } from 'prom-client';

export const registry = new Registry();

// Automatically collects: heap, GC, event loop lag, active handles, CPU
collectDefaultMetrics({ register: registry, prefix: 'nodejs_' });

// Custom metrics
export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5],
  registers: [registry],
});

export const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [registry],
});

// Expose /metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.end(await registry.metrics());
});

The collectDefaultMetrics call gives you nodejs_eventloop_lag_seconds, nodejs_heap_size_used_bytes, and nodejs_gc_duration_seconds automatically — these are exactly the Node.js-specific metrics you need.

OpenTelemetry Auto-Instrumentation

For distributed tracing (spans across microservices), OpenTelemetry auto-instrumentation is the standard. It traces HTTP requests, database calls, Redis, gRPC, and more without modifying your business logic.

# Install
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
npm install @opentelemetry/exporter-otlp-http

# src/instrumentation.ts (load BEFORE any other imports)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-otlp-http';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },
      '@opentelemetry/instrumentation-redis': { enabled: true },
      '@opentelemetry/instrumentation-http': { enabled: true },
    }),
  ],
});

sdk.start();

# package.json — load instrumentation before app
{
  "scripts": {
    "start": "node --require ./dist/instrumentation.js dist/server.js"
  }
}

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your Node.js services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Detecting Memory Leaks in Production

Node.js memory leaks typically come from three sources: global variables accumulating references, event listeners not being removed, and cached objects with no expiry. Here's how to find them without restarting:

# Step 1: Take heap snapshots on demand (no downtime)
# Add to your app:
import { writeHeapSnapshot } from 'v8';

process.on('SIGUSR2', () => {
  const filename = writeHeapSnapshot();
  console.log('Heap snapshot written to', filename);
});

# Trigger from shell:
kill -USR2 <node-pid>

# Step 2: Open snapshot in Chrome DevTools
# chrome://inspect → Open dedicated DevTools for Node
# Memory tab → Load .heapsnapshot file
# Sort by "Retained Size" to find growing object types

# Step 3: Common leak patterns to search for:
# - EventEmitter listeners (MaxListenersExceededWarning)
# - Interval/timeout never cleared (setInterval without clearInterval)
# - Large arrays appended to module-level variables
# - Closure holding references to large objects

Early warning signal: Monitor nodejs_heap_size_used_bytes over a 24-hour window. Healthy apps plateau after warmup. A leak shows as slow monotonic growth that never dips back to baseline even after GC cycles.

Alert Rules for Node.js

# Prometheus alert rules for Node.js
groups:
  - name: nodejs
    rules:
      # Event loop blocked
      - alert: NodeJSEventLoopLagHigh
        expr: nodejs_eventloop_lag_seconds > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Node.js event loop lag > 100ms"
          description: "CPU-bound work may be blocking the event loop"

      # Memory leak signal
      - alert: NodeJSHeapGrowth
        expr: |
          (nodejs_heap_size_used_bytes - nodejs_heap_size_used_bytes offset 30m)
          / nodejs_heap_size_used_bytes offset 30m > 0.2
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Node.js heap grew >20% in 30 minutes"

      # Near OOM
      - alert: NodeJSHeapCritical
        expr: |
          nodejs_heap_size_used_bytes / nodejs_heap_size_total_bytes > 0.85
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node.js heap >85% full — OOM crash imminent"

      # Process restart (tracks process uptime drops)
      - alert: NodeJSProcessRestarted
        expr: changes(nodejs_process_start_time_seconds[5m]) > 0
        labels:
          severity: warning
        annotations:
          summary: "Node.js process restarted"

      # High error rate
      - alert: NodeJSHighErrorRate
        expr: |
          rate(http_requests_total{status_code=~"5.."}[5m]) /
          rate(http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node.js HTTP 5xx rate > 1%"

APM Tools for Node.js

ToolNode.js DepthStandout FeaturePricing
New RelicExcellentNode.js flamegraphs, thread profiling, free 100GB/moFree + $0.35/GB
Datadog APMExcellentContinuous profiler, runtime metrics dashboard, heap viz$31/host/month
Better StackGoodUptime + log monitoring together, simple setupFree + $20/mo
Sentry PerformanceGoodFrontend + backend trace correlation, error contextFree + $26/mo
Grafana CloudGoodManaged Prometheus + Loki; prom-client metrics straight inFree tier + usage
Clinic.jsExcellentEvent loop, flame graph, bubbleprof — Node.js specific, open sourceFree (local profiling)

FAQ

What metrics should I monitor for a Node.js application?

The seven critical metrics: event loop lag, heap used, external memory (Buffers), GC pause duration, active handles, CPU usage, and HTTP request p95 latency. Event loop lag and heap growth are the most uniquely Node.js — they indicate the failure modes that general APM tools often miss.

How do I detect a Node.js memory leak in production?

Watch nodejs_heap_size_used_bytes over time. A leak shows as monotonic growth that never returns to baseline after GC. To identify the source: trigger heap snapshots with SIGUSR2 (using v8.writeHeapSnapshot()), open in Chrome DevTools Memory tab, sort by Retained Size to find growing object types.

What is event loop lag and why does it matter?

Event loop lag measures delay between scheduling a callback and when it actually runs. Normal is under 10ms. Over 100ms, users notice slow responses. Over 500ms, synchronous CPU work is blocking the loop — JSON.parse of a huge payload, a tight loop, or a synchronous file read. All HTTP handlers wait while this runs.

How do I add OpenTelemetry to a Node.js application?

Install @opentelemetry/sdk-node and @opentelemetry/auto-instrumentations-node. Create an instrumentation.ts file with NodeSDK initialized with your exporter. Load it before everything else via --require ./instrumentation.js. Auto-instrumentation traces Express, HTTP, PostgreSQL, Redis, and MongoDB without business logic changes.

What is the best APM tool for Node.js?

For self-hosted: prom-client + Grafana Cloud is the most flexible and free-tier friendly. For managed: New Relic Node.js agent is mature with a free 100GB/month tier. Datadog APM has the deepest Node.js integration including continuous profiling. Clinic.js is excellent for local profiling and event loop diagnosis.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

Related Guides

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your Node.js services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial