What to Do When an API is Down | Troubleshooting Guide 2025

Your application suddenly stops working. Users are complaining. Error logs are flooded with 500 errors and timeouts. The culprit? A third-party API you depend on has gone down.

Whether it's a payment processor, authentication service, or data provider, API outages happen to every service eventually. The difference between a minor inconvenience and a major crisis comes down to how quickly you diagnose the issue and how well you've prepared for this scenario.

This guide walks you through exactly what to do when an API fails—from confirming the outage to implementing robust fallback strategies.

Step 1: Confirm It's Actually Down

Before panicking, verify the API is actually experiencing an outage. Many "API down" issues are actually:

Rate limiting — You've exceeded your API quota
Authentication issues — Expired tokens or revoked API keys
Network issues — Your server can't reach the API
Code bugs — A recent deployment broke something

Check Official Status Pages

Most APIs maintain status pages that report current incidents. Check these first:

Stripe — status.stripe.com Twilio — status.twilio.com AWS — health.aws.amazon.com OpenAI — status.openai.com GitHub — githubstatus.com Google Cloud — status.cloud.google.com

Important: Status pages are often slow to acknowledge issues. Use API Status Check for real-time monitoring that detects outages faster than official pages.

Test the API Directly

Isolate your code by testing the API with a simple curl command:

# Test API endpoint directly

curl -v https://api.example.com/v1/health

# With authentication

curl -H "Authorization: Bearer YOUR_API_KEY" \

https://api.example.com/v1/status

If the curl request succeeds but your app fails, the issue is in your implementation. If curl also fails, the API is likely down.

Step 2: Check Your Implementation

Before blaming the API, rule out common self-inflicted issues:

Quick Checklist:

✅ API keys still valid? — Keys can be rotated or revoked
✅ Rate limits? — Check if you're getting 429 errors
✅ Recent code changes? — Did someone deploy recently?
✅ SSL certificates? — Expired certs cause connection failures
✅ Request format correct? — API schemas can change
✅ Network/firewall rules? — Server networking changes

Understanding Error Codes

The HTTP status code tells you a lot about what's wrong:

Code	Meaning	Action
401	Unauthorized	Check API key/token
403	Forbidden	Check permissions/scopes
429	Rate Limited	Back off, check quota
500	Server Error	API is down, retry later
502	Bad Gateway	Upstream issue, retry
503	Service Unavailable	API overloaded, retry
504	Gateway Timeout	Upstream slow, retry

Key insight: 4xx errors are usually your problem. 5xx errors are usually theirs. Don't retry 4xx errors—they won't magically succeed.

Step 3: Implement Retries with Exponential Backoff

Transient failures are common. A proper retry strategy handles them gracefully:

// JavaScript: Exponential backoff with jitter

async function fetchWithRetry(url, options, maxRetries = 3) {

let lastError;

for (let attempt = 0; attempt < maxRetries; attempt++) {

try {

const response = await fetch(url, options);

// Don't retry client errors (4xx)

if (response.status >= 400 && response.status < 500) {

throw new Error(`Client error: ${response.status}`);

}

if (!response.ok) {

throw new Error(`Server error: ${response.status}`);

}

return response;

} catch (error) {

lastError = error;

// Exponential backoff: 1s, 2s, 4s... with jitter

const baseDelay = Math.pow(2, attempt) * 1000;

const jitter = Math.random() * 1000;

const delay = Math.min(baseDelay + jitter, 30000);

await new Promise(r => setTimeout(r, delay));

}

throw lastError;

}

Why add jitter?

Without jitter, all your retries happen at the same time, creating a "thundering herd" that overwhelms the API as soon as it starts recovering. Adding random delay spreads out the retry attempts.

Step 4: Implement Fallback Strategies

When retries fail, you need fallback behavior. Here are the most effective strategies:

Strategy 1: Serve Cached Data

If you cache API responses, serve stale data with a warning rather than showing an error.

// Check cache before API call

async function getData(userId) {

try {

const data = await fetchFromAPI(userId);

await cache.set(`user:${userId}`, data, TTL_1_HOUR);

return { data, stale: false };

} catch (error) {

const cached = await cache.get(`user:${userId}`);

if (cached) {

return { data: cached, stale: true };

}

throw error;

}

Strategy 2: Graceful Degradation

Disable non-critical features while keeping core functionality working:

Recommendations API down? Show "Popular Items" instead
Analytics API down? Queue events for later, don't block
Payment API down? Allow cart building, block checkout with a clear message

Strategy 3: Backup Providers

For critical APIs, have a backup provider configured:

Example backup providers:

Email: SendGrid → Mailgun → Amazon SES
SMS: Twilio → Vonage → AWS SNS
Payments: Stripe → PayPal → Adyen
Maps: Google Maps → Mapbox → OpenStreetMap
AI/LLM: OpenAI → Anthropic → Google AI

Strategy 4: Circuit Breaker Pattern

Stop making requests to a failing API to prevent cascading failures:

// Simple circuit breaker

class CircuitBreaker {

constructor(threshold = 5, resetTimeout = 30000) {

this.failures = 0;

this.threshold = threshold;

this.resetTimeout = resetTimeout;

this.state = 'CLOSED'; // CLOSED, OPEN, HALF_OPEN

}

async call(fn) {

if (this.state === 'OPEN') {

throw new Error('Circuit breaker is OPEN');

}

try {

const result = await fn();

this.onSuccess();

return result;

} catch (error) {

this.onFailure();

throw error;

}

Step 5: Communicate with Users

Clear communication reduces frustration and support tickets:

❌ Bad Error Message

"Error: Connection refused"

"500 Internal Server Error"

"Something went wrong"

✅ Good Error Message

"We're experiencing issues with our payment provider. Please try again in a few minutes or contact support."

Best practices:

Explain what's not working in plain language
Give an estimated time to resolution if known
Provide alternative actions (contact support, try later)
Don't expose technical details or stack traces

Step 6: Monitor for Recovery

Don't keep manually checking. Set up automated monitoring:

Get Instant Recovery Alerts

API Status Check monitors 100+ popular APIs and sends instant alerts when they go down—and when they recover. Know the moment you can resume normal operations.

Set Up Monitoring →

Preparing for the Next Outage

The best time to prepare for API downtime is before it happens:

Pre-Outage Checklist:

1. Implement proper error handling — Never assume API calls will succeed
2. Add retries with backoff — Handle transient failures gracefully
3. Cache what you can — Stale data is better than no data
4. Use circuit breakers — Fail fast when APIs are down
5. Identify backup providers — Know your alternatives
6. Set up monitoring — Get alerts before users complain
7. Document runbooks — Know exactly what to do
8. Practice chaos engineering — Test your fallbacks regularly

📡

Recommended

Prevent API Outages with Better Stack

30-second monitoring, instant alerts, and beautiful status pages. Catch API issues before they impact users.

Try Better Stack Free →

Popular APIs to Monitor

Check the real-time status of these commonly-used APIs:

Stripe Twilio SendGrid OpenAI AWS Firebase Auth0 Cloudflare

View all 100+ monitored APIs →

Frequently Asked Questions

How do I know if an API is down or if it's my code?

First, check the API's official status page. Then test the API with a simple curl command to isolate your code. If the raw API request fails, the issue is upstream. If it works but your app fails, the problem is in your implementation.

Should I retry failed API requests?

Yes, but implement exponential backoff with jitter. Start with a 1-second delay, then double it each retry (1s, 2s, 4s, 8s...) with some randomness. Set a maximum of 3-5 retries. Don't retry 4xx errors—only 5xx and network errors.

How can I prepare for API downtime before it happens?

Implement proper error handling, cache responses when possible, use circuit breakers, identify backup providers, set up monitoring, and practice chaos engineering by simulating API failures in development.

What's a circuit breaker pattern?

A circuit breaker stops making requests to a failing API after a threshold of failures. This prevents cascading failures and reduces load on the struggling service. After a timeout, it allows test requests through. If they succeed, normal operation resumes.

📡

Recommended

Monitor All Your Critical APIs

Better Stack tracks uptime, latency, and incidents across all your dependencies. Set up monitoring in 30 seconds.

Try Better Stack Free →

Step 1: Confirm It's Actually Down

Check Official Status Pages

Test the API Directly

Step 2: Check Your Implementation

Quick Checklist:

Understanding Error Codes

Step 3: Implement Retries with Exponential Backoff

Why add jitter?

Step 4: Implement Fallback Strategies

Strategy 1: Serve Cached Data

Strategy 2: Graceful Degradation

Strategy 3: Backup Providers

Example backup providers:

Strategy 4: Circuit Breaker Pattern

Step 5: Communicate with Users

❌ Bad Error Message

✅ Good Error Message

Step 6: Monitor for Recovery

Get Instant Recovery Alerts

Preparing for the Next Outage

Pre-Outage Checklist:

Popular APIs to Monitor

Frequently Asked Questions

How do I know if an API is down or if it's my code?

Should I retry failed API requests?

How can I prepare for API downtime before it happens?

What's a circuit breaker pattern?

Know When APIs Go Down