How WebSockets Fail (And What to Monitor)
WebSocket failures fall into three categories, each requiring different monitoring approaches:
| Failure Type | Symptom | What to Monitor |
|---|---|---|
| Upgrade failure | HTTP 426/400 instead of 101 Switching Protocols | WS connection success rate |
| Dropped connection | Connection closes unexpectedly (close code 1006) | Abnormal close rate |
| Message delivery failure | Messages sent but not received | Message round-trip latency |
| Connection limit hit | New connections refused | Active connection count |
| Silent drop | TCP connection alive, but no messages delivered | Keepalive/heartbeat tracking |
Layer 1: WebSocket Connectivity Check
The most basic check: can a client establish a WebSocket connection? This verifies the upgrade handshake, TLS (for wss://), and that the server is accepting connections.
// Node.js WebSocket connectivity check
const WebSocket = require('ws');
function checkWebSocket(url, timeout = 5000) {
return new Promise((resolve, reject) => {
const ws = new WebSocket(url);
const timer = setTimeout(() => {
ws.terminate();
reject(new Error('Connection timeout'));
}, timeout);
ws.on('open', () => {
clearTimeout(timer);
ws.close(1000, 'health check');
resolve({ status: 'healthy', latency: Date.now() - startTime });
});
ws.on('error', (err) => {
clearTimeout(timer);
reject(err);
});
const startTime = Date.now();
});
}
// Usage
checkWebSocket('wss://api.example.com/ws')
.then(result => console.log('Healthy:', result))
.catch(err => console.error('ALERT:', err.message));Verifying the 101 Upgrade Response
A successful WebSocket connection returns HTTP 101 Switching Protocols. If you see HTTP 200, 400, or 426 instead, the server isn't accepting WebSocket upgrades. Common causes: reverse proxy not configured for WebSocket, missing Upgrade: websocket header passthrough in nginx/Caddy, or TLS termination stripping upgrade headers.
📡 Monitor your WebSocket API uptime every 30 seconds — get alerted in under a minute
Trusted by 100,000+ websites · Free tier available
Layer 2: Message Round-Trip Latency
Connectivity checks verify the connection was established. Message latency checks verify that messages are actually flowing through the application layer — not just that the TCP connection is alive.
// WebSocket ping/pong latency check
function checkWebSocketLatency(url, testMessage) {
return new Promise((resolve, reject) => {
const ws = new WebSocket(url);
ws.on('open', () => {
const sentAt = Date.now();
ws.send(JSON.stringify({ type: 'ping', id: sentAt }));
ws.on('message', (data) => {
const msg = JSON.parse(data);
if (msg.type === 'pong' && msg.id === sentAt) {
const latency = Date.now() - sentAt;
ws.close(1000);
resolve({ latency, timestamp: new Date() });
}
});
});
setTimeout(() => {
ws.terminate();
reject(new Error('Ping timeout — no pong received'));
}, 10000);
});
}Alert if P95 message latency exceeds your SLA threshold. For real-time applications, this is typically <100ms. For chat applications, <500ms. For financial data feeds, often <50ms.
Layer 3: Connection Lifecycle Metrics
Key Metrics to Track Server-Side
- Active connections over time: Track the count of currently open WebSocket connections. Sudden drops indicate mass disconnections. Unexpected growth indicates connection leaks (connections not being properly closed).
- Connection duration distribution: How long do connections stay open? Very short connections (<10 seconds) in large volumes may indicate clients connecting and immediately failing.
- Reconnection rate: Track how often clients reconnect. A client that connects, drops, and reconnects every few seconds is experiencing instability — this may not be visible in your raw connection count.
- Close code distribution: Track WebSocket close codes across all disconnections. Alert on increasing rates of error codes (1006, 1011).
WebSocket Close Codes Reference
| Code | Meaning | Action |
|---|---|---|
| 1000 | Normal closure | No action — expected |
| 1001 | Server going away | Expected on deployments |
| 1006 | Abnormal closure (no close frame) | Alert — network failure or crash |
| 1008 | Policy violation | Check auth/rate-limit logic |
| 1011 | Server error (unexpected condition) | Alert — application error |
| 4xxx | Application-defined | Depends on your application |
Layer 4: Keepalive and Silent Drop Detection
WebSocket connections can appear alive at the TCP level while silently failing to deliver messages. This happens when:
- A network device drops the connection silently (NAT timeout, firewall idle-connection limit)
- A load balancer closes idle connections without sending a close frame
- A server-side bug consumes the connection but stops processing messages
The solution: implement application-level heartbeats. Every 30-60 seconds, the server sends a ping (or custom "heartbeat" message). If the client doesn't respond within a timeout, close the connection and log it as a silent drop. This makes invisible failures visible.
// Server-side heartbeat implementation (Node.js/ws)
const HEARTBEAT_INTERVAL = 30000; // 30 seconds
const HEARTBEAT_TIMEOUT = 10000; // 10 seconds to respond
wss.on('connection', (ws) => {
ws.isAlive = true;
ws.on('pong', () => {
ws.isAlive = true; // Mark alive on pong response
});
const heartbeatTimer = setInterval(() => {
if (!ws.isAlive) {
console.log('Silent drop detected — terminating');
silentDropCounter.inc(); // Prometheus counter
return ws.terminate();
}
ws.isAlive = false;
ws.ping(); // WebSocket protocol ping
}, HEARTBEAT_INTERVAL);
ws.on('close', () => clearInterval(heartbeatTimer));
});Monitor your WebSocket endpoints with Better Stack
Better Stack supports WebSocket monitoring with custom connection checks and message round-trip latency tracking.
Try Better Stack Free →Nginx Configuration for WebSocket Proxying
A common source of WebSocket monitoring failures is a reverse proxy stripping the upgrade headers. Proper nginx configuration:
# nginx.conf — required for WebSocket proxying
location /ws {
proxy_pass http://websocket_backend;
proxy_http_version 1.1;
# Required for WebSocket upgrade
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection "upgrade";
# Increase timeout — WebSockets are long-lived
proxy_read_timeout 3600s; # 1 hour
proxy_send_timeout 3600s;
# Pass real client IP
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}WebSocket Monitoring Checklist
- ☐External connectivity check: WebSocket upgrade (101) successful every 60 seconds
- ☐Message round-trip latency check: ping/pong within SLA threshold
- ☐Active connection count monitoring — alert on unexpected drops
- ☐Abnormal close code (1006) rate monitoring — alert on spike
- ☐Server error close code (1011) monitoring
- ☐Reconnection rate tracking — clients reconnecting > 1x/minute is a warning
- ☐Server-side heartbeat with silent drop detection
- ☐Nginx/load balancer: proxy_set_header Upgrade configured
- ☐Load balancer WebSocket timeout set to match your keepalive interval
- ☐Alert runbook for WebSocket failures
Frequently Asked Questions
How do you monitor WebSocket connections?
Monitor WebSockets with: connectivity checks (attempt WS handshake, verify 101 response), message latency checks (ping/pong round-trip), active connection count, abnormal close code rates, and reconnection frequency. Use a WebSocket-capable monitoring tool — standard HTTP monitors cannot verify WS endpoint health.
What is a WebSocket health check?
A WebSocket health check establishes a WS connection, sends a test message, and verifies a response within the expected latency window. Unlike HTTP health checks, it must use a WebSocket client and verify the 101 Switching Protocols handshake.
What are WebSocket close codes and why do they matter for monitoring?
WebSocket close codes indicate why a connection closed. Code 1000 = normal. Code 1006 = abnormal closure (network failure, server crash). Code 1011 = server error. Tracking close code distribution helps distinguish expected closures from failures requiring investigation.
How is WebSocket monitoring different from REST API monitoring?
REST monitoring checks discrete request-response pairs. WebSocket monitoring tracks persistent, long-lived connections — requiring connectivity checks, heartbeat/keepalive monitoring, and tracking connection lifecycle events that have no equivalent in request-response APIs.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your WebSocket API goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your WebSocket API + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial