API Health Check Endpoints: Complete Implementation Guide
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
A health check endpoint is one of the most valuable additions to any production API. It gives load balancers, orchestrators, and monitoring tools a single URL to determine whether your service is ready to handle traffic — and exactly what's wrong when it isn't. This guide covers what to build, how to build it, and how to monitor it effectively.
What Is a Health Check Endpoint?
A health check endpoint is a dedicated URL — typically /health, /healthz, or /status — that reports the current operational state of your service. When called, it:
- Verifies the service process is running
- Checks connectivity to critical dependencies (database, cache, message queue)
- Reports overall service status as a simple pass/fail or detailed breakdown
- Returns an appropriate HTTP status code (200 for healthy, 503 for unhealthy)
Every component in your infrastructure uses this endpoint. Load balancers use it to decide whether to route traffic. Kubernetes uses it to decide whether to restart containers. Monitoring tools use it to trigger alerts. Your on-call engineer uses it to quickly assess an incident.
Liveness vs Readiness vs Startup Probes
Kubernetes distinguishes three types of health probes — and even outside Kubernetes, this mental model is useful for structuring your health checks:
Liveness Probe: /healthz/live
Answers: "Is this service alive, or is it stuck in a broken state that requires a restart?"
Liveness checks should be fast and simple — they just verify the process is running and not deadlocked. Don't include database checks here. If a database goes down, you don't want Kubernetes to restart your service — the service itself is fine, just a dependency is unavailable.
If the liveness probe fails, Kubernetes restarts the container.
Readiness Probe: /healthz/ready
Answers: "Is this service ready to accept incoming traffic right now?"
Readiness checks can be more comprehensive. Check database connectivity, cache availability, and any other dependencies required to serve requests. If the readiness probe fails, Kubernetes removes the pod from the service's endpoints — traffic stops routing to it — but doesn't restart it.
Use readiness probes to:
- Signal when a service has finished warming up (loading caches, establishing connection pools)
- Temporarily remove a pod from rotation when a dependency is unavailable
- Implement graceful shutdown (fail readiness before terminating)
Startup Probe: /healthz/startup
Answers: "Has the application finished initializing?"
Used for slow-starting containers. While the startup probe is active, liveness and readiness probes are disabled. Once it succeeds, Kubernetes switches to the liveness and readiness probes. Prevents slow startups from triggering liveness restarts.
Monitor Your Health Endpoints with Better Stack
Poll your /health endpoint every 30 seconds, get instant alerts when it starts failing, and track uptime over time.
Try Better Stack Free →What to Include in Your Health Check
A production-grade health check should validate your service's ability to do its job. For most services, that means:
Required Checks
- Database connectivity: Can you connect and run a lightweight query (e.g.,
SELECT 1)? - Service process: Is the process running? Is memory usage within bounds?
Recommended Checks
- Cache connectivity: Is Redis/Memcached reachable?
- Message queue: Is Kafka/RabbitMQ/SQS reachable?
- Critical external API: Can you reach a payment processor or auth service?
- Disk space: Is there enough free disk space for logs and temporary files?
Optional Metadata
- Version: Current service version or git SHA (useful during deploys)
- Uptime: How long the process has been running
- Environment: Production, staging, etc.
- Check durations: How long each dependency check took
Health Check Response Format
The response body should be JSON with a consistent structure that monitoring tools and humans can parse:
Return HTTP 200 for healthy, HTTP 503 for unhealthy or degraded. This lets load balancers and proxies act on the status without parsing the body.
Implementation Examples
Node.js (Express)
Python (FastAPI)
Go
Get Alerted When Health Checks Fail
Set up HTTP monitoring on your /health endpoint. Alert on 503 responses, slow check times, or when dependencies start failing.
Try Better Stack Free →Health Check Timeouts
Health check endpoints must have strict timeouts. A health check that takes 30 seconds is worse than no health check — it blocks the load balancer from making routing decisions.
- Total health check timeout: 5 seconds maximum
- Per-check timeout: 2 seconds per dependency check
- Database check timeout: 1 second — if the DB takes more than 1 second to respond to
SELECT 1, it's effectively unavailable
Run dependency checks in parallel when possible to keep total health check time low:
Security Considerations
What to Expose Publicly vs Privately
Not all health check details should be publicly accessible. Consider two tiers:
- Public
/health: Returns only{ "status": "ok" }or HTTP 200/503. No dependency details. Safe for load balancers and CDN health probes. - Internal
/health/detailed: Returns full component status, versions, and dependency states. Requires authentication or IP allowlist (internal network only).
Don't Leak Internal Details
Error messages in health check responses can reveal your infrastructure topology. "connection refused: host postgres-primary.internal:5432" tells attackers what database you're using and your internal hostname. In public health endpoints, return generic error messages like "database unavailable" rather than the raw exception.
Rate Limit Health Endpoints
Health endpoints that run database queries on every call can be exploited for denial-of-service attacks. Add rate limiting (100 req/min is generous for legitimate monitors) and cache health check results for 5-10 seconds to avoid hammering your database with health check queries.
Kubernetes Configuration
Here's a complete Kubernetes probe configuration using separate liveness and readiness endpoints:
Key settings:
- initialDelaySeconds: Wait before first probe (allows app to start)
- periodSeconds: How often to probe
- failureThreshold: How many consecutive failures before action
- timeoutSeconds: How long to wait for a probe response
Monitoring Health Endpoints from Outside
Health endpoints are most valuable when monitored continuously by an external system — not just your infrastructure. External monitoring catches issues your internal probes miss (network routing problems, CDN failures, geographic outages).
What to Monitor
- Availability: Is the health endpoint returning 200?
- Response time: Is the endpoint responding quickly? Slow health checks often precede full failures.
- Component status: Parse the JSON to alert on individual dependency failures
- Version changes: Track version field to verify successful deployments
Polling Frequency
Poll your health endpoint every 30-60 seconds from external monitors. This gives you a median time-to-detect of 15-30 seconds for outages, which is sufficient for most SLOs. Polling more frequently (every 10 seconds) is appropriate for high-availability systems, but generates more load.
For critical paths, set up monitoring from multiple geographic regions to distinguish regional outages from global ones.
External Health Monitoring with Better Stack
Poll your health endpoints from 15+ global locations. Get alerted via Slack, PagerDuty, or SMS when any check fails. Free tier available.
Try Better Stack Free →Health Checks During Deployment
Health checks are critical during rolling deployments. The pattern:
- Start new pod: Startup probe runs until it passes
- Readiness probe passes: Pod is added to load balancer rotation
- Traffic shifts gradually: Old pod gets less traffic as new pod handles more
- Graceful shutdown begins on old pod: Readiness probe is deliberately failed to drain traffic
- Old pod terminates: After all in-flight requests complete
To implement graceful shutdown, listen for SIGTERM and immediately fail your readiness probe while continuing to handle in-flight requests:
Common Health Check Mistakes
Checking Too Much
A health check that calls external payment APIs or runs complex queries makes your service dependent on those external systems for basic availability. If Stripe is down, should Kubernetes restart your containers? Probably not. Keep health checks focused on what's required to serve requests.
No Timeouts
Health checks without timeouts can hang indefinitely when dependencies are slow, causing the health check to appear failed and triggering unnecessary restarts or routing changes. Always set explicit timeouts.
Exposing Too Much Information
Detailed health responses on public endpoints leak infrastructure details. Use tiered endpoints: simple public check, detailed internal check.
Same Endpoint for Liveness and Readiness
Using the same endpoint for both means a database outage (which should only fail readiness) will also fail liveness — causing your pods to restart in a loop even though the application itself is fine. Always separate them.
Health Check Implementation Checklist
- ☐
/healthreturns 200/503 with JSON status body - ☐ Separate
/healthz/liveand/healthz/readyendpoints - ☐ Database connectivity check with 1-2 second timeout
- ☐ Cache connectivity check (if used)
- ☐ Checks run in parallel, not sequentially
- ☐ Total health check timeout < 5 seconds
- ☐ Results cached for 5-10 seconds to prevent DB hammering
- ☐ Sensitive details require authentication or internal network only
- ☐ Version/build info included for deployment verification
- ☐ Graceful shutdown fails readiness probe before terminating
- ☐ External monitoring polling the endpoint every 30-60 seconds
- ☐ Alerts configured for 503 responses and slow response times
Key Takeaways
- Health check endpoints are the foundation of reliable deployments and infrastructure automation
- Separate liveness (is the process alive?) from readiness (can it serve traffic?)
- Check only what's required to serve requests — external dependencies included
- Run checks in parallel with strict timeouts (2 seconds per check, 5 seconds total)
- Return 200 for healthy, 503 for unhealthy — not 500
- Cache health check results to avoid hammering your database
- Use tiered endpoints: simple public check, detailed internal check
- Monitor from external systems — internal probes alone don't catch network routing failures
A well-implemented health check endpoint is invisible when everything is working — and invaluable when something breaks. It's the first thing your monitoring tool checks, the first thing your load balancer consults, and the first thing an on-call engineer looks at during an incident. Get it right once and it pays dividends forever.
Monitor Your Health Endpoints 24/7
APIStatusCheck monitors your /health endpoints from multiple global locations, tracking uptime and alerting your team the moment a check starts failing.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time API Health Check goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for API Health Check + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”