API Health Checks: Complete Implementation Guide
API Health Checks: Complete Implementation Guide
API health checks are critical endpoints that report the operational status of your services. They enable monitoring systems, load balancers, and orchestrators to detect failures and route traffic appropriately.
What Are API Health Checks?
A health check endpoint is a simple API route that returns the current health status of your service and its dependencies. Unlike monitoring that tracks metrics over time, health checks provide instant yes/no answers about service availability.
Why Health Checks Matter
Automated recovery: Container orchestrators like Kubernetes automatically restart unhealthy containers based on health check failures.
Load balancer routing: Health checks prevent traffic from reaching degraded instances, improving overall reliability.
Dependency visibility: Health checks expose the status of critical dependencies like databases, caches, and external APIs.
Faster incident detection: Health check monitoring can detect issues seconds after they occur, rather than waiting for user reports.
Health Check Types
Different systems need different health check approaches:
Liveness Checks
Liveness checks answer "Is this service process running and responsive?"
// Simple liveness check
app.get('/health/live', (req, res) => {
res.status(200).json({ status: 'alive' });
});
Liveness checks should be:
- Extremely lightweight (no external dependencies)
- Fast to respond (< 100ms)
- Only fail if the process is truly broken
Kubernetes uses liveness checks to decide when to restart containers. A failing liveness check means "this container is dead and needs to be replaced."
Readiness Checks
Readiness checks answer "Is this service ready to handle traffic?"
app.get('/health/ready', async (req, res) => {
try {
// Check database connection
await db.ping();
// Check cache availability
await redis.ping();
res.status(200).json({
status: 'ready',
checks: {
database: 'healthy',
cache: 'healthy'
}
});
} catch (error) {
res.status(503).json({
status: 'not_ready',
error: error.message
});
}
});
Readiness checks should verify:
- Critical dependencies are available
- Required data is loaded
- The service can handle requests
Kubernetes uses readiness checks to control load balancer routing. A failing readiness check means "this instance needs time to recover, but don't kill it yet."
Startup Checks
Startup checks answer "Has this service finished initializing?"
let isInitialized = false;
async function initialize() {
// Load configuration
await loadConfig();
// Warm up caches
await warmupCache();
// Run database migrations
await runMigrations();
isInitialized = true;
}
app.get('/health/startup', (req, res) => {
if (isInitialized) {
res.status(200).json({ status: 'started' });
} else {
res.status(503).json({ status: 'starting' });
}
});
Startup checks are especially valuable for services with long initialization times. Kubernetes will wait for startup checks to pass before running liveness checks, preventing premature restarts during initialization.
Implementing Comprehensive Health Checks
A production-grade health check system needs more than simple ping endpoints:
Structured Response Format
{
"status": "healthy", // healthy, degraded, unhealthy
"timestamp": "2026-03-18T12:00:00Z",
"version": "2.4.1",
"checks": {
"database": {
"status": "healthy",
"responseTime": "12ms"
},
"redis": {
"status": "healthy",
"responseTime": "3ms"
},
"externalAPI": {
"status": "degraded",
"responseTime": "856ms",
"message": "Response time above threshold"
}
},
"metrics": {
"uptime": 86400,
"requests": 1503421,
"errors": 23
}
}
This format provides:
- Overall service status
- Individual dependency health
- Performance metrics
- Version information for debugging
Dependency Health Checks
Check each critical dependency with appropriate timeouts:
async function checkDatabase() {
const start = Date.now();
try {
await db.query('SELECT 1', { timeout: 2000 });
return {
status: 'healthy',
responseTime: `${Date.now() - start}ms`
};
} catch (error) {
return {
status: 'unhealthy',
error: error.message
};
}
}
async function checkExternalAPI() {
const start = Date.now();
try {
const response = await fetch('https://api.example.com/health', {
timeout: 3000
});
const responseTime = Date.now() - start;
if (!response.ok) {
return {
status: 'unhealthy',
statusCode: response.status
};
}
return {
status: responseTime < 500 ? 'healthy' : 'degraded',
responseTime: `${responseTime}ms`
};
} catch (error) {
return {
status: 'unhealthy',
error: error.message
};
}
}
Health Check Caching
Avoid overwhelming dependencies by caching health check results:
class HealthCheckCache {
constructor(ttl = 5000) {
this.ttl = ttl;
this.cache = new Map();
}
async get(key, checkFunction) {
const cached = this.cache.get(key);
if (cached && Date.now() - cached.timestamp < this.ttl) {
return cached.result;
}
const result = await checkFunction();
this.cache.set(key, {
result,
timestamp: Date.now()
});
return result;
}
}
const healthCache = new HealthCheckCache(5000);
app.get('/health/ready', async (req, res) => {
const dbHealth = await healthCache.get('database', checkDatabase);
const apiHealth = await healthCache.get('externalAPI', checkExternalAPI);
const overallHealthy =
dbHealth.status === 'healthy' &&
apiHealth.status !== 'unhealthy';
res.status(overallHealthy ? 200 : 503).json({
status: overallHealthy ? 'healthy' : 'unhealthy',
checks: {
database: dbHealth,
externalAPI: apiHealth
}
});
});
Caching prevents health check storms where hundreds of monitoring requests per second overwhelm your dependencies.
Health Check Best Practices
1. Use Appropriate HTTP Status Codes
- 200 OK: Service is healthy
- 503 Service Unavailable: Service is unhealthy
- 429 Too Many Requests: Health checks are being rate limited
Avoid using 500 Internal Server Error for health checks—reserve that for actual application errors.
2. Keep Liveness Checks Simple
Your liveness check should only verify that the process is alive and the HTTP server is responding. Don't check dependencies.
// Good liveness check
app.get('/health/live', (req, res) => {
res.status(200).send('OK');
});
// Bad liveness check (checks dependencies)
app.get('/health/live', async (req, res) => {
await db.ping(); // Don't do this in liveness!
res.status(200).send('OK');
});
If your database is down but your application is running, you don't want Kubernetes killing the container—you want it marked as "not ready" but alive, so it can recover when the database comes back.
3. Set Appropriate Timeouts
Health check endpoints should respond quickly:
- Liveness: < 100ms
- Readiness: < 1000ms
- Startup: < 5000ms (can be longer for complex initialization)
If your health checks are slow, monitoring systems may mark healthy instances as unhealthy due to timeouts.
4. Consider Circuit Breakers
For external dependencies, implement circuit breaker patterns:
class CircuitBreaker {
constructor(threshold = 5, timeout = 60000) {
this.failureCount = 0;
this.threshold = threshold;
this.timeout = timeout;
this.state = 'closed'; // closed, open, half-open
this.nextAttempt = null;
}
async execute(fn) {
if (this.state === 'open') {
if (Date.now() < this.nextAttempt) {
throw new Error('Circuit breaker is open');
}
this.state = 'half-open';
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
onSuccess() {
this.failureCount = 0;
this.state = 'closed';
}
onFailure() {
this.failureCount++;
if (this.failureCount >= this.threshold) {
this.state = 'open';
this.nextAttempt = Date.now() + this.timeout;
}
}
}
This prevents cascading failures and excessive health check traffic to failing services.
5. Version Your Health Checks
Include version information to help with debugging:
app.get('/health', (req, res) => {
res.json({
status: 'healthy',
version: process.env.APP_VERSION || 'unknown',
commit: process.env.GIT_COMMIT || 'unknown',
buildDate: process.env.BUILD_DATE || 'unknown'
});
});
When investigating incidents, knowing which version is running on each instance is invaluable.
Monitoring Health Check Endpoints
Health checks are only useful if something is monitoring them:
Kubernetes Configuration
apiVersion: v1
kind: Pod
metadata:
name: myapp
spec:
containers:
- name: myapp
image: myapp:latest
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 5
periodSeconds: 10
timeoutSeconds: 1
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 5
timeoutSeconds: 2
failureThreshold: 2
startupProbe:
httpGet:
path: /health/startup
port: 8080
initialDelaySeconds: 0
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 30
External Monitoring
Use external monitoring services to check health endpoints from multiple locations:
// Send health check metrics to your monitoring system
const client = new MonitoringClient();
setInterval(async () => {
try {
const response = await fetch('http://localhost:8080/health/ready');
const data = await response.json();
client.recordHealthCheck({
status: data.status,
checks: data.checks,
timestamp: Date.now()
});
} catch (error) {
client.recordHealthCheckError(error);
}
}, 10000);
External monitoring catches issues that internal health checks might miss, like DNS failures or network partitions.
Common Health Check Pitfalls
Pitfall 1: Checking Non-Critical Dependencies
Don't fail health checks for optional features:
// Bad: Fails health check if analytics service is down
const analyticsHealth = await checkAnalytics();
if (analyticsHealth !== 'healthy') {
return res.status(503).json({ status: 'unhealthy' });
}
// Good: Reports analytics status but doesn't fail overall check
const analyticsHealth = await checkAnalytics();
res.status(200).json({
status: 'healthy',
checks: {
analytics: analyticsHealth // reported but not critical
}
});
Only fail health checks for dependencies that would prevent your service from functioning.
Pitfall 2: Cascading Health Check Failures
If Service A checks Service B, and Service B checks Service C, a failure in C can cascade and mark everything unhealthy:
// Bad: Synchronously checking downstream services
app.get('/health', async (req, res) => {
const downstream = await fetch('http://service-b/health');
// Now Service A's health depends on Service B's health
});
// Good: Check only direct dependencies
app.get('/health', async (req, res) => {
// Only check things this service directly uses
const dbHealth = await checkDatabase();
const cacheHealth = await checkCache();
// Don't check Service B unless this service calls it
});
Pitfall 3: No Health Check Authentication
Exposing detailed health information publicly can leak infrastructure details:
// Option 1: Different endpoints for internal/external
app.get('/health', (req, res) => {
res.json({ status: 'healthy' }); // Public, minimal info
});
app.get('/health/detailed', authenticateInternal, (req, res) => {
res.json({
status: 'healthy',
checks: detailedChecks, // Internal only
metrics: sensitiveMetrics
});
});
// Option 2: Authentication on all health endpoints
app.get('/health', authenticate, (req, res) => {
// Requires API key or internal network
});
Health Checks for Different Architectures
Microservices
Each service should have its own health check endpoint:
// User Service
app.get('/health', async (req, res) => {
const dbHealth = await checkUserDatabase();
res.json({
service: 'user-service',
status: dbHealth.status,
checks: { database: dbHealth }
});
});
// Order Service
app.get('/health', async (req, res) => {
const dbHealth = await checkOrderDatabase();
const paymentHealth = await checkPaymentGateway();
res.json({
service: 'order-service',
status: dbHealth.status === 'healthy' ? 'healthy' : 'unhealthy',
checks: {
database: dbHealth,
payment: paymentHealth
}
});
});
Don't create a "health check aggregator" that checks all services—that creates tight coupling and cascading failures.
Serverless Functions
Serverless platforms often don't support traditional health checks:
// Lambda health verification via CloudWatch
exports.handler = async (event) => {
// Startup checks
if (!isInitialized) {
await initialize();
}
// Dependency checks
try {
await verifyDependencies();
} catch (error) {
// Log to CloudWatch for alarming
console.error('Dependency check failed:', error);
throw error;
}
// Process request
return handleRequest(event);
};
Use CloudWatch alarms on error rates and duration metrics instead of traditional health check endpoints.
Background Workers
Workers processing queues need health checks too:
class WorkerHealthCheck {
constructor() {
this.lastProcessedTime = Date.now();
this.isProcessing = false;
}
markProcessing() {
this.isProcessing = true;
this.lastProcessedTime = Date.now();
}
markComplete() {
this.isProcessing = false;
this.lastProcessedTime = Date.now();
}
getHealth() {
const timeSinceLastProcess = Date.now() - this.lastProcessedTime;
// If stuck processing for >5 minutes, something is wrong
if (this.isProcessing && timeSinceLastProcess > 300000) {
return { status: 'unhealthy', reason: 'Stuck processing job' };
}
// If no jobs processed in >10 minutes, queue might be stuck
if (timeSinceLastProcess > 600000) {
return { status: 'degraded', reason: 'No recent job processing' };
}
return { status: 'healthy' };
}
}
const workerHealth = new WorkerHealthCheck();
app.get('/health', (req, res) => {
const health = workerHealth.getHealth();
res.status(health.status === 'unhealthy' ? 503 : 200).json(health);
});
Testing Health Check Endpoints
Include health checks in your test suite:
describe('Health Check Endpoints', () => {
it('returns 200 when all dependencies are healthy', async () => {
const response = await request(app).get('/health/ready');
expect(response.status).toBe(200);
expect(response.body.status).toBe('healthy');
});
it('returns 503 when database is down', async () => {
// Mock database failure
jest.spyOn(db, 'ping').mockRejectedValue(new Error('Connection refused'));
const response = await request(app).get('/health/ready');
expect(response.status).toBe(503);
expect(response.body.status).toBe('unhealthy');
});
it('caches health check results', async () => {
const dbPingSpy = jest.spyOn(db, 'ping');
// First request
await request(app).get('/health/ready');
expect(dbPingSpy).toHaveBeenCalledTimes(1);
// Second request (should use cache)
await request(app).get('/health/ready');
expect(dbPingSpy).toHaveBeenCalledTimes(1);
// Wait for cache expiry
await sleep(6000);
// Third request (cache expired)
await request(app).get('/health/ready');
expect(dbPingSpy).toHaveBeenCalledTimes(2);
});
});
Conclusion
Well-implemented health checks are essential infrastructure for reliable services. They enable automated recovery, intelligent traffic routing, and faster incident detection.
Key takeaways:
- Use separate liveness, readiness, and startup checks
- Keep liveness checks simple and fast
- Cache health check results to prevent dependency overload
- Only check critical dependencies
- Include version information for debugging
- Test health check endpoints in your test suite
With robust health checks in place, your monitoring systems can automatically detect and respond to failures, often before users notice any issues.
Need help monitoring your APIs? API Status Check provides real-time status monitoring for thousands of popular APIs, with instant notifications when services go down.
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
Free dashboard available · 14-day trial on paid plans · Cancel anytime
Browse Free Dashboard →