Node.jsAPM2026 Guide

Node.js Monitoring Guide: Performance Metrics, APM & Alerts (2026)

Q: What metrics should I monitor for a Node.js application?

The seven critical Node.js metrics are: (1) Event loop lag — how long the event loop is blocked, (2) Heap used vs total — tracks memory growth toward leaks, (3) External memory — Buffers and C++ objects outside V8 heap, (4) GC pause time — how long garbage collection stops execution, (5) Active handles and requests — connections, timers, streams in use, (6) CPU usage — both user and system time, (7) Request rate and p95 latency. Event loop lag and heap growth are the top two because they indicate the uniquely Node.js failure modes: blocking the event loop and memory leaks.

Q: How do I detect a Node.js memory leak in production?

Watch heap used grow over time without returning to baseline after GC cycles. Signs: (1) process.memoryUsage().heapUsed climbs continuously after each request batch, (2) GC frequency increases as the heap fills, (3) Eventually the process hits --max-old-space-size and crashes with FATAL ERROR: Reached heap limit. To identify the leak: take V8 heap snapshots at intervals (--heapsnapshot-signal=SIGUSR2), then diff them in Chrome DevTools to find growing object types.

Q: What is event loop lag and why does it matter?

Event loop lag measures how much time passes between scheduling a callback and when it actually fires. Normal is under 10ms. Above 100ms, users start noticing slow responses. Above 500ms, you likely have synchronous CPU work blocking the loop — a CPU-intensive computation, a synchronous file read, or a massive JSON.parse call. Node.js is single-threaded: one blocked callback blocks all other callbacks, including your HTTP request handlers.

Q: What is the best APM tool for Node.js?

For open source: Prometheus (metrics via prom-client) + Grafana + Jaeger (tracing) is the most popular self-hosted stack. For managed: New Relic Node.js agent is mature and free up to 100GB/month. Datadog APM offers the deepest Node.js integration including flamegraphs. Better Stack is excellent for uptime + logs combined. Sentry Performance adds frontend correlation to backend Node.js traces.

Node.js has unique failure modes — the event loop, heap memory leaks, and garbage collection pauses — that generic APM tools often miss. This guide covers what to monitor, how to instrument with OpenTelemetry, and which tools give the deepest Node.js visibility.

Updated April 2026•13 min read•Node.js / APM

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Node.js Monitoring Checklist

✅ Track event loop lag — alert if > 100ms sustained
✅ Watch heap used over time — monotonic growth = memory leak
✅ Monitor GC pause duration — high GC pressure signals heap pressure
✅ Use prom-client or OpenTelemetry to expose metrics
✅ Add an external uptime check — catch crashes Node.js won't log
✅ Set alerts on p95 response time, error rate, and process restart count

Why Node.js Is Different to Monitor

Most languages run on multiple threads — a slow request blocks one thread, others keep serving. Node.js uses a single-threaded event loop. One blocked callback blocks every request. This creates failure modes you won't see in Java or Go apps:

Node.js-specific problems

• Event loop blocking (synchronous CPU work)
• V8 heap memory leaks (closures, caches)
• GC pauses causing latency spikes
• Callback hell / unhandled promise rejections
• Max heap limit crashes (OOM)

Standard metrics (still needed)

• HTTP request rate and latency
• Error rate and 5xx breakdown
• CPU usage (user vs system)
• Database query latency
• External API call success rate

Core Node.js Metrics Reference

Metric	API	Alert Threshold
Event loop lag	perf_hooks / clinic.js	> 100ms sustained (warn), > 500ms (critical)
heapUsed	process.memoryUsage()	Growing trend over 30m; > 80% of --max-old-space-size
heapTotal	process.memoryUsage()	Tracks V8 allocated heap (watch heapUsed/heapTotal ratio)
external	process.memoryUsage()	C++ objects + Buffers; high value = Buffer leak
GC duration	perf_hooks PerformanceObserver	Major GC > 100ms; frequent GC = heap pressure
Active handles	process._getActiveHandles()	Growing handle count = resource leak (open sockets, timers)
CPU usage	process.cpuUsage()	> 80% user CPU sustained = event loop blocking risk

📡

Recommended

Monitor your Node.js API endpoints with Better Stack

Better Stack runs uptime checks on your Node.js APIs from 30+ global locations. Catch crashes and slowdowns before your users do.

Try Better Stack Free →

Instrumenting Node.js with prom-client

prom-client is the most popular Node.js Prometheus client. It automatically collects default Node.js metrics (heap, GC, event loop) and lets you define custom business metrics.

# Install
npm install prom-client

# src/metrics.ts
import { Registry, collectDefaultMetrics, Histogram, Counter, Gauge } from 'prom-client';

export const registry = new Registry();

// Automatically collects: heap, GC, event loop lag, active handles, CPU
collectDefaultMetrics({ register: registry, prefix: 'nodejs_' });

// Custom metrics
export const httpRequestDuration = new Histogram({
  name: 'http_request_duration_seconds',
  help: 'HTTP request duration in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5],
  registers: [registry],
});

export const httpRequestTotal = new Counter({
  name: 'http_requests_total',
  help: 'Total HTTP requests',
  labelNames: ['method', 'route', 'status_code'],
  registers: [registry],
});

// Expose /metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', registry.contentType);
  res.end(await registry.metrics());
});

The collectDefaultMetrics call gives you nodejs_eventloop_lag_seconds, nodejs_heap_size_used_bytes, and nodejs_gc_duration_seconds automatically — these are exactly the Node.js-specific metrics you need.

OpenTelemetry Auto-Instrumentation

For distributed tracing (spans across microservices), OpenTelemetry auto-instrumentation is the standard. It traces HTTP requests, database calls, Redis, gRPC, and more without modifying your business logic.

# Install
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
npm install @opentelemetry/exporter-otlp-http

# src/instrumentation.ts (load BEFORE any other imports)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-otlp-http';

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
  }),
  instrumentations: [
    getNodeAutoInstrumentations({
      '@opentelemetry/instrumentation-express': { enabled: true },
      '@opentelemetry/instrumentation-pg': { enabled: true },
      '@opentelemetry/instrumentation-redis': { enabled: true },
      '@opentelemetry/instrumentation-http': { enabled: true },
    }),
  ],
});

sdk.start();

# package.json — load instrumentation before app
{
  "scripts": {
    "start": "node --require ./dist/instrumentation.js dist/server.js"
  }
}

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Node.js services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Detecting Memory Leaks in Production

Node.js memory leaks typically come from three sources: global variables accumulating references, event listeners not being removed, and cached objects with no expiry. Here's how to find them without restarting:

# Step 1: Take heap snapshots on demand (no downtime)
# Add to your app:
import { writeHeapSnapshot } from 'v8';

process.on('SIGUSR2', () => {
  const filename = writeHeapSnapshot();
  console.log('Heap snapshot written to', filename);
});

# Trigger from shell:
kill -USR2 <node-pid>

# Step 2: Open snapshot in Chrome DevTools
# chrome://inspect → Open dedicated DevTools for Node
# Memory tab → Load .heapsnapshot file
# Sort by "Retained Size" to find growing object types

# Step 3: Common leak patterns to search for:
# - EventEmitter listeners (MaxListenersExceededWarning)
# - Interval/timeout never cleared (setInterval without clearInterval)
# - Large arrays appended to module-level variables
# - Closure holding references to large objects

Early warning signal: Monitor nodejs_heap_size_used_bytes over a 24-hour window. Healthy apps plateau after warmup. A leak shows as slow monotonic growth that never dips back to baseline even after GC cycles.

Alert Rules for Node.js

# Prometheus alert rules for Node.js
groups:
  - name: nodejs
    rules:
      # Event loop blocked
      - alert: NodeJSEventLoopLagHigh
        expr: nodejs_eventloop_lag_seconds > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "Node.js event loop lag > 100ms"
          description: "CPU-bound work may be blocking the event loop"

      # Memory leak signal
      - alert: NodeJSHeapGrowth
        expr: |
          (nodejs_heap_size_used_bytes - nodejs_heap_size_used_bytes offset 30m)
          / nodejs_heap_size_used_bytes offset 30m > 0.2
        for: 30m
        labels:
          severity: warning
        annotations:
          summary: "Node.js heap grew >20% in 30 minutes"

      # Near OOM
      - alert: NodeJSHeapCritical
        expr: |
          nodejs_heap_size_used_bytes / nodejs_heap_size_total_bytes > 0.85
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node.js heap >85% full — OOM crash imminent"

      # Process restart (tracks process uptime drops)
      - alert: NodeJSProcessRestarted
        expr: changes(nodejs_process_start_time_seconds[5m]) > 0
        labels:
          severity: warning
        annotations:
          summary: "Node.js process restarted"

      # High error rate
      - alert: NodeJSHighErrorRate
        expr: |
          rate(http_requests_total{status_code=~"5.."}[5m]) /
          rate(http_requests_total[5m]) > 0.01
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "Node.js HTTP 5xx rate > 1%"

APM Tools for Node.js

Tool	Node.js Depth	Standout Feature	Pricing
New Relic	Excellent	Node.js flamegraphs, thread profiling, free 100GB/mo	Free + $0.35/GB
Datadog APM	Excellent	Continuous profiler, runtime metrics dashboard, heap viz	$31/host/month
Better Stack	Good	Uptime + log monitoring together, simple setup	Free + $20/mo
Sentry Performance	Good	Frontend + backend trace correlation, error context	Free + $26/mo
Grafana Cloud	Good	Managed Prometheus + Loki; prom-client metrics straight in	Free tier + usage
Clinic.js	Excellent	Event loop, flame graph, bubbleprof — Node.js specific, open source	Free (local profiling)

FAQ

What metrics should I monitor for a Node.js application?

The seven critical metrics: event loop lag, heap used, external memory (Buffers), GC pause duration, active handles, CPU usage, and HTTP request p95 latency. Event loop lag and heap growth are the most uniquely Node.js — they indicate the failure modes that general APM tools often miss.

How do I detect a Node.js memory leak in production?

Watch nodejs_heap_size_used_bytes over time. A leak shows as monotonic growth that never returns to baseline after GC. To identify the source: trigger heap snapshots with SIGUSR2 (using v8.writeHeapSnapshot()), open in Chrome DevTools Memory tab, sort by Retained Size to find growing object types.

What is event loop lag and why does it matter?

Event loop lag measures delay between scheduling a callback and when it actually runs. Normal is under 10ms. Over 100ms, users notice slow responses. Over 500ms, synchronous CPU work is blocking the loop — JSON.parse of a huge payload, a tight loop, or a synchronous file read. All HTTP handlers wait while this runs.

How do I add OpenTelemetry to a Node.js application?

Install @opentelemetry/sdk-node and @opentelemetry/auto-instrumentations-node. Create an instrumentation.ts file with NodeSDK initialized with your exporter. Load it before everything else via --require ./instrumentation.js. Auto-instrumentation traces Express, HTTP, PostgreSQL, Redis, and MongoDB without business logic changes.

What is the best APM tool for Node.js?

For self-hosted: prom-client + Grafana Cloud is the most flexible and free-tier friendly. For managed: New Relic Node.js agent is mature with a free 100GB/month tier. Datadog APM has the deepest Node.js integration including continuous profiling. Clinic.js is excellent for local profiling and event loop diagnosis.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

14-day free trial

Stop checking — get alerted instantly

Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Node.js services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Node.js Monitoring Guide: Performance Metrics, APM & Alerts (2026)

TL;DR — Node.js Monitoring Checklist

Why Node.js Is Different to Monitor

Node.js-specific problems

Standard metrics (still needed)

Core Node.js Metrics Reference

Instrumenting Node.js with prom-client

OpenTelemetry Auto-Instrumentation

Stop checking — get alerted instantly

Detecting Memory Leaks in Production

Alert Rules for Node.js

APM Tools for Node.js

FAQ

What metrics should I monitor for a Node.js application?

How do I detect a Node.js memory leak in production?

What is event loop lag and why does it matter?

How do I add OpenTelemetry to a Node.js application?

What is the best APM tool for Node.js?

Related Guides

Best APM Tools 2026

Distributed Tracing Guide

Error Tracking Guide 2026

LLM API Monitoring Guide

Stop checking — get alerted instantly