Node.js Monitoring Guide: Performance Metrics, APM & Alerts (2026)
Node.js has unique failure modes — the event loop, heap memory leaks, and garbage collection pauses — that generic APM tools often miss. This guide covers what to monitor, how to instrument with OpenTelemetry, and which tools give the deepest Node.js visibility.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
TL;DR — Node.js Monitoring Checklist
- ✅ Track event loop lag — alert if > 100ms sustained
- ✅ Watch heap used over time — monotonic growth = memory leak
- ✅ Monitor GC pause duration — high GC pressure signals heap pressure
- ✅ Use prom-client or OpenTelemetry to expose metrics
- ✅ Add an external uptime check — catch crashes Node.js won't log
- ✅ Set alerts on p95 response time, error rate, and process restart count
Why Node.js Is Different to Monitor
Most languages run on multiple threads — a slow request blocks one thread, others keep serving. Node.js uses a single-threaded event loop. One blocked callback blocks every request. This creates failure modes you won't see in Java or Go apps:
Node.js-specific problems
- • Event loop blocking (synchronous CPU work)
- • V8 heap memory leaks (closures, caches)
- • GC pauses causing latency spikes
- • Callback hell / unhandled promise rejections
- • Max heap limit crashes (OOM)
Standard metrics (still needed)
- • HTTP request rate and latency
- • Error rate and 5xx breakdown
- • CPU usage (user vs system)
- • Database query latency
- • External API call success rate
Core Node.js Metrics Reference
| Metric | API | Alert Threshold |
|---|---|---|
| Event loop lag | perf_hooks / clinic.js | > 100ms sustained (warn), > 500ms (critical) |
| heapUsed | process.memoryUsage() | Growing trend over 30m; > 80% of --max-old-space-size |
| heapTotal | process.memoryUsage() | Tracks V8 allocated heap (watch heapUsed/heapTotal ratio) |
| external | process.memoryUsage() | C++ objects + Buffers; high value = Buffer leak |
| GC duration | perf_hooks PerformanceObserver | Major GC > 100ms; frequent GC = heap pressure |
| Active handles | process._getActiveHandles() | Growing handle count = resource leak (open sockets, timers) |
| CPU usage | process.cpuUsage() | > 80% user CPU sustained = event loop blocking risk |
Monitor your Node.js API endpoints with Better Stack
Better Stack runs uptime checks on your Node.js APIs from 30+ global locations. Catch crashes and slowdowns before your users do.
Try Better Stack Free →Instrumenting Node.js with prom-client
prom-client is the most popular Node.js Prometheus client. It automatically collects default Node.js metrics (heap, GC, event loop) and lets you define custom business metrics.
# Install
npm install prom-client
# src/metrics.ts
import { Registry, collectDefaultMetrics, Histogram, Counter, Gauge } from 'prom-client';
export const registry = new Registry();
// Automatically collects: heap, GC, event loop lag, active handles, CPU
collectDefaultMetrics({ register: registry, prefix: 'nodejs_' });
// Custom metrics
export const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.01, 0.05, 0.1, 0.25, 0.5, 1, 2, 5],
registers: [registry],
});
export const httpRequestTotal = new Counter({
name: 'http_requests_total',
help: 'Total HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [registry],
});
// Expose /metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', registry.contentType);
res.end(await registry.metrics());
});The collectDefaultMetrics call gives you nodejs_eventloop_lag_seconds, nodejs_heap_size_used_bytes, and nodejs_gc_duration_seconds automatically — these are exactly the Node.js-specific metrics you need.
OpenTelemetry Auto-Instrumentation
For distributed tracing (spans across microservices), OpenTelemetry auto-instrumentation is the standard. It traces HTTP requests, database calls, Redis, gRPC, and more without modifying your business logic.
# Install
npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node
npm install @opentelemetry/exporter-otlp-http
# src/instrumentation.ts (load BEFORE any other imports)
import { NodeSDK } from '@opentelemetry/sdk-node';
import { getNodeAutoInstrumentations } from '@opentelemetry/auto-instrumentations-node';
import { OTLPTraceExporter } from '@opentelemetry/exporter-otlp-http';
const sdk = new NodeSDK({
traceExporter: new OTLPTraceExporter({
url: process.env.OTEL_EXPORTER_OTLP_ENDPOINT || 'http://localhost:4318/v1/traces',
}),
instrumentations: [
getNodeAutoInstrumentations({
'@opentelemetry/instrumentation-express': { enabled: true },
'@opentelemetry/instrumentation-pg': { enabled: true },
'@opentelemetry/instrumentation-redis': { enabled: true },
'@opentelemetry/instrumentation-http': { enabled: true },
}),
],
});
sdk.start();
# package.json — load instrumentation before app
{
"scripts": {
"start": "node --require ./dist/instrumentation.js dist/server.js"
}
}Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Node.js services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Detecting Memory Leaks in Production
Node.js memory leaks typically come from three sources: global variables accumulating references, event listeners not being removed, and cached objects with no expiry. Here's how to find them without restarting:
# Step 1: Take heap snapshots on demand (no downtime)
# Add to your app:
import { writeHeapSnapshot } from 'v8';
process.on('SIGUSR2', () => {
const filename = writeHeapSnapshot();
console.log('Heap snapshot written to', filename);
});
# Trigger from shell:
kill -USR2 <node-pid>
# Step 2: Open snapshot in Chrome DevTools
# chrome://inspect → Open dedicated DevTools for Node
# Memory tab → Load .heapsnapshot file
# Sort by "Retained Size" to find growing object types
# Step 3: Common leak patterns to search for:
# - EventEmitter listeners (MaxListenersExceededWarning)
# - Interval/timeout never cleared (setInterval without clearInterval)
# - Large arrays appended to module-level variables
# - Closure holding references to large objectsEarly warning signal: Monitor nodejs_heap_size_used_bytes over a 24-hour window. Healthy apps plateau after warmup. A leak shows as slow monotonic growth that never dips back to baseline even after GC cycles.
Alert Rules for Node.js
# Prometheus alert rules for Node.js
groups:
- name: nodejs
rules:
# Event loop blocked
- alert: NodeJSEventLoopLagHigh
expr: nodejs_eventloop_lag_seconds > 0.1
for: 2m
labels:
severity: warning
annotations:
summary: "Node.js event loop lag > 100ms"
description: "CPU-bound work may be blocking the event loop"
# Memory leak signal
- alert: NodeJSHeapGrowth
expr: |
(nodejs_heap_size_used_bytes - nodejs_heap_size_used_bytes offset 30m)
/ nodejs_heap_size_used_bytes offset 30m > 0.2
for: 30m
labels:
severity: warning
annotations:
summary: "Node.js heap grew >20% in 30 minutes"
# Near OOM
- alert: NodeJSHeapCritical
expr: |
nodejs_heap_size_used_bytes / nodejs_heap_size_total_bytes > 0.85
for: 5m
labels:
severity: critical
annotations:
summary: "Node.js heap >85% full — OOM crash imminent"
# Process restart (tracks process uptime drops)
- alert: NodeJSProcessRestarted
expr: changes(nodejs_process_start_time_seconds[5m]) > 0
labels:
severity: warning
annotations:
summary: "Node.js process restarted"
# High error rate
- alert: NodeJSHighErrorRate
expr: |
rate(http_requests_total{status_code=~"5.."}[5m]) /
rate(http_requests_total[5m]) > 0.01
for: 5m
labels:
severity: critical
annotations:
summary: "Node.js HTTP 5xx rate > 1%"APM Tools for Node.js
| Tool | Node.js Depth | Standout Feature | Pricing |
|---|---|---|---|
| New Relic | Excellent | Node.js flamegraphs, thread profiling, free 100GB/mo | Free + $0.35/GB |
| Datadog APM | Excellent | Continuous profiler, runtime metrics dashboard, heap viz | $31/host/month |
| Better Stack | Good | Uptime + log monitoring together, simple setup | Free + $20/mo |
| Sentry Performance | Good | Frontend + backend trace correlation, error context | Free + $26/mo |
| Grafana Cloud | Good | Managed Prometheus + Loki; prom-client metrics straight in | Free tier + usage |
| Clinic.js | Excellent | Event loop, flame graph, bubbleprof — Node.js specific, open source | Free (local profiling) |
FAQ
What metrics should I monitor for a Node.js application?
The seven critical metrics: event loop lag, heap used, external memory (Buffers), GC pause duration, active handles, CPU usage, and HTTP request p95 latency. Event loop lag and heap growth are the most uniquely Node.js — they indicate the failure modes that general APM tools often miss.
How do I detect a Node.js memory leak in production?
Watch nodejs_heap_size_used_bytes over time. A leak shows as monotonic growth that never returns to baseline after GC. To identify the source: trigger heap snapshots with SIGUSR2 (using v8.writeHeapSnapshot()), open in Chrome DevTools Memory tab, sort by Retained Size to find growing object types.
What is event loop lag and why does it matter?
Event loop lag measures delay between scheduling a callback and when it actually runs. Normal is under 10ms. Over 100ms, users notice slow responses. Over 500ms, synchronous CPU work is blocking the loop — JSON.parse of a huge payload, a tight loop, or a synchronous file read. All HTTP handlers wait while this runs.
How do I add OpenTelemetry to a Node.js application?
Install @opentelemetry/sdk-node and @opentelemetry/auto-instrumentations-node. Create an instrumentation.ts file with NodeSDK initialized with your exporter. Load it before everything else via --require ./instrumentation.js. Auto-instrumentation traces Express, HTTP, PostgreSQL, Redis, and MongoDB without business logic changes.
What is the best APM tool for Node.js?
For self-hosted: prom-client + Grafana Cloud is the most flexible and free-tier friendly. For managed: New Relic Node.js agent is mature with a free 100GB/month tier. Datadog APM has the deepest Node.js integration including continuous profiling. Clinic.js is excellent for local profiling and event loop diagnosis.
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”
Related Guides
Best APM Tools 2026
Compare top application performance monitoring platforms.
Distributed Tracing Guide
Traces across microservices — OpenTelemetry, Jaeger, Zipkin.
Error Tracking Guide 2026
Sentry, Rollbar, Bugsnag — exception monitoring comparison.
LLM API Monitoring Guide
Monitor OpenAI, Anthropic, and LLM integrations in Node.js apps.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Node.js services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Node.js services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial