What to Do When an API is Down
A developer's guide to diagnosing API outages, implementing fallbacks, and minimizing the impact on your users.
TL;DR
Immediate steps: 1) Check the API's status page and API Status Check to confirm it's them, not you. 2) Implement retries with exponential backoff. 3) Serve cached data if available. 4) Show users a clear error message. 5) Set up monitoring to know when it recovers.
Your application suddenly stops working. Users are complaining. Error logs are flooded with 500 errors and timeouts. The culprit? A third-party API you depend on has gone down.
Whether it's a payment processor, authentication service, or data provider, API outages happen to every service eventually. The difference between a minor inconvenience and a major crisis comes down to how quickly you diagnose the issue and how well you've prepared for this scenario.
This guide walks you through exactly what to do when an API fails—from confirming the outage to implementing robust fallback strategies.
Step 1: Confirm It's Actually Down
Before panicking, verify the API is actually experiencing an outage. Many "API down" issues are actually:
- Rate limiting — You've exceeded your API quota
- Authentication issues — Expired tokens or revoked API keys
- Network issues — Your server can't reach the API
- Code bugs — A recent deployment broke something
Check Official Status Pages
Most APIs maintain status pages that report current incidents. Check these first:
Important: Status pages are often slow to acknowledge issues. Use API Status Check for real-time monitoring that detects outages faster than official pages.
Test the API Directly
Isolate your code by testing the API with a simple curl command:
# Test API endpoint directly
curl -v https://api.example.com/v1/health
# With authentication
curl -H "Authorization: Bearer YOUR_API_KEY" \
https://api.example.com/v1/status
If the curl request succeeds but your app fails, the issue is in your implementation. If curl also fails, the API is likely down.
Step 2: Check Your Implementation
Before blaming the API, rule out common self-inflicted issues:
Quick Checklist:
- ✅ API keys still valid? — Keys can be rotated or revoked
- ✅ Rate limits? — Check if you're getting 429 errors
- ✅ Recent code changes? — Did someone deploy recently?
- ✅ SSL certificates? — Expired certs cause connection failures
- ✅ Request format correct? — API schemas can change
- ✅ Network/firewall rules? — Server networking changes
Understanding Error Codes
The HTTP status code tells you a lot about what's wrong:
| Code | Meaning | Action |
|---|---|---|
| 401 | Unauthorized | Check API key/token |
| 403 | Forbidden | Check permissions/scopes |
| 429 | Rate Limited | Back off, check quota |
| 500 | Server Error | API is down, retry later |
| 502 | Bad Gateway | Upstream issue, retry |
| 503 | Service Unavailable | API overloaded, retry |
| 504 | Gateway Timeout | Upstream slow, retry |
Key insight: 4xx errors are usually your problem. 5xx errors are usually theirs. Don't retry 4xx errors—they won't magically succeed.
Step 3: Implement Retries with Exponential Backoff
Transient failures are common. A proper retry strategy handles them gracefully:
// JavaScript: Exponential backoff with jitter
async function fetchWithRetry(url, options, maxRetries = 3) {
let lastError;
for (let attempt = 0; attempt < maxRetries; attempt++) {
try {
const response = await fetch(url, options);
// Don't retry client errors (4xx)
if (response.status >= 400 && response.status < 500) {
throw new Error(`Client error: ${response.status}`);
}
if (!response.ok) {
throw new Error(`Server error: ${response.status}`);
}
return response;
} catch (error) {
lastError = error;
// Exponential backoff: 1s, 2s, 4s... with jitter
const baseDelay = Math.pow(2, attempt) * 1000;
const jitter = Math.random() * 1000;
const delay = Math.min(baseDelay + jitter, 30000);
await new Promise(r => setTimeout(r, delay));
}
}
throw lastError;
}
Why add jitter?
Without jitter, all your retries happen at the same time, creating a "thundering herd" that overwhelms the API as soon as it starts recovering. Adding random delay spreads out the retry attempts.
Step 4: Implement Fallback Strategies
When retries fail, you need fallback behavior. Here are the most effective strategies:
Strategy 1: Serve Cached Data
If you cache API responses, serve stale data with a warning rather than showing an error.
// Check cache before API call
async function getData(userId) {
try {
const data = await fetchFromAPI(userId);
await cache.set(`user:${userId}`, data, TTL_1_HOUR);
return { data, stale: false };
} catch (error) {
const cached = await cache.get(`user:${userId}`);
if (cached) {
return { data: cached, stale: true };
}
throw error;
}
}
Strategy 2: Graceful Degradation
Disable non-critical features while keeping core functionality working:
- Recommendations API down? Show "Popular Items" instead
- Analytics API down? Queue events for later, don't block
- Payment API down? Allow cart building, block checkout with a clear message
Strategy 3: Backup Providers
For critical APIs, have a backup provider configured:
Example backup providers:
- Email: SendGrid → Mailgun → Amazon SES
- SMS: Twilio → Vonage → AWS SNS
- Payments: Stripe → PayPal → Adyen
- Maps: Google Maps → Mapbox → OpenStreetMap
- AI/LLM: OpenAI → Anthropic → Google AI
Strategy 4: Circuit Breaker Pattern
Stop making requests to a failing API to prevent cascading failures:
// Simple circuit breaker
class CircuitBreaker {
constructor(threshold = 5, resetTimeout = 30000) {
this.failures = 0;
this.threshold = threshold;
this.resetTimeout = resetTimeout;
this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN
}
async call(fn) {
if (this.state === 'OPEN') {
throw new Error('Circuit breaker is OPEN');
}
try {
const result = await fn();
this.onSuccess();
return result;
} catch (error) {
this.onFailure();
throw error;
}
}
}
Step 5: Communicate with Users
Clear communication reduces frustration and support tickets:
❌ Bad Error Message
"Error: Connection refused"
"500 Internal Server Error"
"Something went wrong"
✅ Good Error Message
"We're experiencing issues with our payment provider. Please try again in a few minutes or contact support."
Best practices:
- Explain what's not working in plain language
- Give an estimated time to resolution if known
- Provide alternative actions (contact support, try later)
- Don't expose technical details or stack traces
Step 6: Monitor for Recovery
Don't keep manually checking. Set up automated monitoring:
Get Instant Recovery Alerts
API Status Check monitors 100+ popular APIs and sends instant alerts when they go down—and when they recover. Know the moment you can resume normal operations.
Set Up Monitoring →Preparing for the Next Outage
The best time to prepare for API downtime is before it happens:
Pre-Outage Checklist:
- 1. Implement proper error handling — Never assume API calls will succeed
- 2. Add retries with backoff — Handle transient failures gracefully
- 3. Cache what you can — Stale data is better than no data
- 4. Use circuit breakers — Fail fast when APIs are down
- 5. Identify backup providers — Know your alternatives
- 6. Set up monitoring — Get alerts before users complain
- 7. Document runbooks — Know exactly what to do
- 8. Practice chaos engineering — Test your fallbacks regularly
Popular APIs to Monitor
Check the real-time status of these commonly-used APIs:
View all 100+ monitored APIs →
Frequently Asked Questions
How do I know if an API is down or if it's my code?
First, check the API's official status page. Then test the API with a simple curl command to isolate your code. If the raw API request fails, the issue is upstream. If it works but your app fails, the problem is in your implementation.
Should I retry failed API requests?
Yes, but implement exponential backoff with jitter. Start with a 1-second delay, then double it each retry (1s, 2s, 4s, 8s...) with some randomness. Set a maximum of 3-5 retries. Don't retry 4xx errors—only 5xx and network errors.
How can I prepare for API downtime before it happens?
Implement proper error handling, cache responses when possible, use circuit breakers, identify backup providers, set up monitoring, and practice chaos engineering by simulating API failures in development.
What's a circuit breaker pattern?
A circuit breaker stops making requests to a failing API after a threshold of failures. This prevents cascading failures and reduces load on the struggling service. After a timeout, it allows test requests through. If they succeed, normal operation resumes.
Know When APIs Go Down
Stop finding out about API outages from angry users. Get instant alerts when any of the 100+ APIs we monitor experience issues.