API Health Checks: Complete Implementation Guide

by API Status Check Team

API Health Checks: Complete Implementation Guide

API health checks are critical endpoints that report the operational status of your services. They enable monitoring systems, load balancers, and orchestrators to detect failures and route traffic appropriately.

What Are API Health Checks?

A health check endpoint is a simple API route that returns the current health status of your service and its dependencies. Unlike monitoring that tracks metrics over time, health checks provide instant yes/no answers about service availability.

Why Health Checks Matter

Automated recovery: Container orchestrators like Kubernetes automatically restart unhealthy containers based on health check failures.

Load balancer routing: Health checks prevent traffic from reaching degraded instances, improving overall reliability.

Dependency visibility: Health checks expose the status of critical dependencies like databases, caches, and external APIs.

Faster incident detection: Health check monitoring can detect issues seconds after they occur, rather than waiting for user reports.

Health Check Types

Different systems need different health check approaches:

Liveness Checks

Liveness checks answer "Is this service process running and responsive?"

// Simple liveness check
app.get('/health/live', (req, res) => {
  res.status(200).json({ status: 'alive' });
});

Liveness checks should be:

  • Extremely lightweight (no external dependencies)
  • Fast to respond (< 100ms)
  • Only fail if the process is truly broken

Kubernetes uses liveness checks to decide when to restart containers. A failing liveness check means "this container is dead and needs to be replaced."

Readiness Checks

Readiness checks answer "Is this service ready to handle traffic?"

app.get('/health/ready', async (req, res) => {
  try {
    // Check database connection
    await db.ping();
    
    // Check cache availability
    await redis.ping();
    
    res.status(200).json({ 
      status: 'ready',
      checks: {
        database: 'healthy',
        cache: 'healthy'
      }
    });
  } catch (error) {
    res.status(503).json({ 
      status: 'not_ready',
      error: error.message 
    });
  }
});

Readiness checks should verify:

  • Critical dependencies are available
  • Required data is loaded
  • The service can handle requests

Kubernetes uses readiness checks to control load balancer routing. A failing readiness check means "this instance needs time to recover, but don't kill it yet."

Startup Checks

Startup checks answer "Has this service finished initializing?"

let isInitialized = false;

async function initialize() {
  // Load configuration
  await loadConfig();
  
  // Warm up caches
  await warmupCache();
  
  // Run database migrations
  await runMigrations();
  
  isInitialized = true;
}

app.get('/health/startup', (req, res) => {
  if (isInitialized) {
    res.status(200).json({ status: 'started' });
  } else {
    res.status(503).json({ status: 'starting' });
  }
});

Startup checks are especially valuable for services with long initialization times. Kubernetes will wait for startup checks to pass before running liveness checks, preventing premature restarts during initialization.

Implementing Comprehensive Health Checks

A production-grade health check system needs more than simple ping endpoints:

Structured Response Format

{
  "status": "healthy", // healthy, degraded, unhealthy
  "timestamp": "2026-03-18T12:00:00Z",
  "version": "2.4.1",
  "checks": {
    "database": {
      "status": "healthy",
      "responseTime": "12ms"
    },
    "redis": {
      "status": "healthy",
      "responseTime": "3ms"
    },
    "externalAPI": {
      "status": "degraded",
      "responseTime": "856ms",
      "message": "Response time above threshold"
    }
  },
  "metrics": {
    "uptime": 86400,
    "requests": 1503421,
    "errors": 23
  }
}

This format provides:

  • Overall service status
  • Individual dependency health
  • Performance metrics
  • Version information for debugging

Dependency Health Checks

Check each critical dependency with appropriate timeouts:

async function checkDatabase() {
  const start = Date.now();
  try {
    await db.query('SELECT 1', { timeout: 2000 });
    return {
      status: 'healthy',
      responseTime: `${Date.now() - start}ms`
    };
  } catch (error) {
    return {
      status: 'unhealthy',
      error: error.message
    };
  }
}

async function checkExternalAPI() {
  const start = Date.now();
  try {
    const response = await fetch('https://api.example.com/health', {
      timeout: 3000
    });
    
    const responseTime = Date.now() - start;
    
    if (!response.ok) {
      return {
        status: 'unhealthy',
        statusCode: response.status
      };
    }
    
    return {
      status: responseTime < 500 ? 'healthy' : 'degraded',
      responseTime: `${responseTime}ms`
    };
  } catch (error) {
    return {
      status: 'unhealthy',
      error: error.message
    };
  }
}

Health Check Caching

Avoid overwhelming dependencies by caching health check results:

class HealthCheckCache {
  constructor(ttl = 5000) {
    this.ttl = ttl;
    this.cache = new Map();
  }
  
  async get(key, checkFunction) {
    const cached = this.cache.get(key);
    
    if (cached && Date.now() - cached.timestamp < this.ttl) {
      return cached.result;
    }
    
    const result = await checkFunction();
    this.cache.set(key, {
      result,
      timestamp: Date.now()
    });
    
    return result;
  }
}

const healthCache = new HealthCheckCache(5000);

app.get('/health/ready', async (req, res) => {
  const dbHealth = await healthCache.get('database', checkDatabase);
  const apiHealth = await healthCache.get('externalAPI', checkExternalAPI);
  
  const overallHealthy = 
    dbHealth.status === 'healthy' && 
    apiHealth.status !== 'unhealthy';
  
  res.status(overallHealthy ? 200 : 503).json({
    status: overallHealthy ? 'healthy' : 'unhealthy',
    checks: {
      database: dbHealth,
      externalAPI: apiHealth
    }
  });
});

Caching prevents health check storms where hundreds of monitoring requests per second overwhelm your dependencies.

Health Check Best Practices

1. Use Appropriate HTTP Status Codes

  • 200 OK: Service is healthy
  • 503 Service Unavailable: Service is unhealthy
  • 429 Too Many Requests: Health checks are being rate limited

Avoid using 500 Internal Server Error for health checks—reserve that for actual application errors.

2. Keep Liveness Checks Simple

Your liveness check should only verify that the process is alive and the HTTP server is responding. Don't check dependencies.

// Good liveness check
app.get('/health/live', (req, res) => {
  res.status(200).send('OK');
});

// Bad liveness check (checks dependencies)
app.get('/health/live', async (req, res) => {
  await db.ping(); // Don't do this in liveness!
  res.status(200).send('OK');
});

If your database is down but your application is running, you don't want Kubernetes killing the container—you want it marked as "not ready" but alive, so it can recover when the database comes back.

3. Set Appropriate Timeouts

Health check endpoints should respond quickly:

  • Liveness: < 100ms
  • Readiness: < 1000ms
  • Startup: < 5000ms (can be longer for complex initialization)

If your health checks are slow, monitoring systems may mark healthy instances as unhealthy due to timeouts.

4. Consider Circuit Breakers

For external dependencies, implement circuit breaker patterns:

class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failureCount = 0;
    this.threshold = threshold;
    this.timeout = timeout;
    this.state = 'closed'; // closed, open, half-open
    this.nextAttempt = null;
  }
  
  async execute(fn) {
    if (this.state === 'open') {
      if (Date.now() < this.nextAttempt) {
        throw new Error('Circuit breaker is open');
      }
      this.state = 'half-open';
    }
    
    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }
  
  onSuccess() {
    this.failureCount = 0;
    this.state = 'closed';
  }
  
  onFailure() {
    this.failureCount++;
    if (this.failureCount >= this.threshold) {
      this.state = 'open';
      this.nextAttempt = Date.now() + this.timeout;
    }
  }
}

This prevents cascading failures and excessive health check traffic to failing services.

5. Version Your Health Checks

Include version information to help with debugging:

app.get('/health', (req, res) => {
  res.json({
    status: 'healthy',
    version: process.env.APP_VERSION || 'unknown',
    commit: process.env.GIT_COMMIT || 'unknown',
    buildDate: process.env.BUILD_DATE || 'unknown'
  });
});

When investigating incidents, knowing which version is running on each instance is invaluable.

Monitoring Health Check Endpoints

Health checks are only useful if something is monitoring them:

Kubernetes Configuration

apiVersion: v1
kind: Pod
metadata:
  name: myapp
spec:
  containers:
  - name: myapp
    image: myapp:latest
    livenessProbe:
      httpGet:
        path: /health/live
        port: 8080
      initialDelaySeconds: 5
      periodSeconds: 10
      timeoutSeconds: 1
      failureThreshold: 3
    readinessProbe:
      httpGet:
        path: /health/ready
        port: 8080
      initialDelaySeconds: 10
      periodSeconds: 5
      timeoutSeconds: 2
      failureThreshold: 2
    startupProbe:
      httpGet:
        path: /health/startup
        port: 8080
      initialDelaySeconds: 0
      periodSeconds: 10
      timeoutSeconds: 3
      failureThreshold: 30

External Monitoring

Use external monitoring services to check health endpoints from multiple locations:

// Send health check metrics to your monitoring system
const client = new MonitoringClient();

setInterval(async () => {
  try {
    const response = await fetch('http://localhost:8080/health/ready');
    const data = await response.json();
    
    client.recordHealthCheck({
      status: data.status,
      checks: data.checks,
      timestamp: Date.now()
    });
  } catch (error) {
    client.recordHealthCheckError(error);
  }
}, 10000);

External monitoring catches issues that internal health checks might miss, like DNS failures or network partitions.

Common Health Check Pitfalls

Pitfall 1: Checking Non-Critical Dependencies

Don't fail health checks for optional features:

// Bad: Fails health check if analytics service is down
const analyticsHealth = await checkAnalytics();
if (analyticsHealth !== 'healthy') {
  return res.status(503).json({ status: 'unhealthy' });
}

// Good: Reports analytics status but doesn't fail overall check
const analyticsHealth = await checkAnalytics();
res.status(200).json({
  status: 'healthy',
  checks: {
    analytics: analyticsHealth // reported but not critical
  }
});

Only fail health checks for dependencies that would prevent your service from functioning.

Pitfall 2: Cascading Health Check Failures

If Service A checks Service B, and Service B checks Service C, a failure in C can cascade and mark everything unhealthy:

// Bad: Synchronously checking downstream services
app.get('/health', async (req, res) => {
  const downstream = await fetch('http://service-b/health');
  // Now Service A's health depends on Service B's health
});

// Good: Check only direct dependencies
app.get('/health', async (req, res) => {
  // Only check things this service directly uses
  const dbHealth = await checkDatabase();
  const cacheHealth = await checkCache();
  // Don't check Service B unless this service calls it
});

Pitfall 3: No Health Check Authentication

Exposing detailed health information publicly can leak infrastructure details:

// Option 1: Different endpoints for internal/external
app.get('/health', (req, res) => {
  res.json({ status: 'healthy' }); // Public, minimal info
});

app.get('/health/detailed', authenticateInternal, (req, res) => {
  res.json({
    status: 'healthy',
    checks: detailedChecks, // Internal only
    metrics: sensitiveMetrics
  });
});

// Option 2: Authentication on all health endpoints
app.get('/health', authenticate, (req, res) => {
  // Requires API key or internal network
});

Health Checks for Different Architectures

Microservices

Each service should have its own health check endpoint:

// User Service
app.get('/health', async (req, res) => {
  const dbHealth = await checkUserDatabase();
  res.json({
    service: 'user-service',
    status: dbHealth.status,
    checks: { database: dbHealth }
  });
});

// Order Service
app.get('/health', async (req, res) => {
  const dbHealth = await checkOrderDatabase();
  const paymentHealth = await checkPaymentGateway();
  
  res.json({
    service: 'order-service',
    status: dbHealth.status === 'healthy' ? 'healthy' : 'unhealthy',
    checks: {
      database: dbHealth,
      payment: paymentHealth
    }
  });
});

Don't create a "health check aggregator" that checks all services—that creates tight coupling and cascading failures.

Serverless Functions

Serverless platforms often don't support traditional health checks:

// Lambda health verification via CloudWatch
exports.handler = async (event) => {
  // Startup checks
  if (!isInitialized) {
    await initialize();
  }
  
  // Dependency checks
  try {
    await verifyDependencies();
  } catch (error) {
    // Log to CloudWatch for alarming
    console.error('Dependency check failed:', error);
    throw error;
  }
  
  // Process request
  return handleRequest(event);
};

Use CloudWatch alarms on error rates and duration metrics instead of traditional health check endpoints.

Background Workers

Workers processing queues need health checks too:

class WorkerHealthCheck {
  constructor() {
    this.lastProcessedTime = Date.now();
    this.isProcessing = false;
  }
  
  markProcessing() {
    this.isProcessing = true;
    this.lastProcessedTime = Date.now();
  }
  
  markComplete() {
    this.isProcessing = false;
    this.lastProcessedTime = Date.now();
  }
  
  getHealth() {
    const timeSinceLastProcess = Date.now() - this.lastProcessedTime;
    
    // If stuck processing for >5 minutes, something is wrong
    if (this.isProcessing && timeSinceLastProcess > 300000) {
      return { status: 'unhealthy', reason: 'Stuck processing job' };
    }
    
    // If no jobs processed in >10 minutes, queue might be stuck
    if (timeSinceLastProcess > 600000) {
      return { status: 'degraded', reason: 'No recent job processing' };
    }
    
    return { status: 'healthy' };
  }
}

const workerHealth = new WorkerHealthCheck();

app.get('/health', (req, res) => {
  const health = workerHealth.getHealth();
  res.status(health.status === 'unhealthy' ? 503 : 200).json(health);
});

Testing Health Check Endpoints

Include health checks in your test suite:

describe('Health Check Endpoints', () => {
  it('returns 200 when all dependencies are healthy', async () => {
    const response = await request(app).get('/health/ready');
    expect(response.status).toBe(200);
    expect(response.body.status).toBe('healthy');
  });
  
  it('returns 503 when database is down', async () => {
    // Mock database failure
    jest.spyOn(db, 'ping').mockRejectedValue(new Error('Connection refused'));
    
    const response = await request(app).get('/health/ready');
    expect(response.status).toBe(503);
    expect(response.body.status).toBe('unhealthy');
  });
  
  it('caches health check results', async () => {
    const dbPingSpy = jest.spyOn(db, 'ping');
    
    // First request
    await request(app).get('/health/ready');
    expect(dbPingSpy).toHaveBeenCalledTimes(1);
    
    // Second request (should use cache)
    await request(app).get('/health/ready');
    expect(dbPingSpy).toHaveBeenCalledTimes(1);
    
    // Wait for cache expiry
    await sleep(6000);
    
    // Third request (cache expired)
    await request(app).get('/health/ready');
    expect(dbPingSpy).toHaveBeenCalledTimes(2);
  });
});

Conclusion

Well-implemented health checks are essential infrastructure for reliable services. They enable automated recovery, intelligent traffic routing, and faster incident detection.

Key takeaways:

  • Use separate liveness, readiness, and startup checks
  • Keep liveness checks simple and fast
  • Cache health check results to prevent dependency overload
  • Only check critical dependencies
  • Include version information for debugging
  • Test health check endpoints in your test suite

With robust health checks in place, your monitoring systems can automatically detect and respond to failures, often before users notice any issues.


Need help monitoring your APIs? API Status Check provides real-time status monitoring for thousands of popular APIs, with instant notifications when services go down.

API Status Check

Stop checking API status pages manually

Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.

Get Alerts — $9/mo →

Free dashboard available · 14-day trial on paid plans · Cancel anytime

Browse Free Dashboard →