What to Do When an API Goes Down: Complete Response Guide
What to Do When an API Goes Down: Complete Response Guide
Quick Answer: When an API goes down: 1) Verify it's actually down (check apistatuscheck.com and test from different networks), 2) Check the official status page, 3) Implement your fallback strategy (cached responses, queue requests, or switch to backup provider), 4) Communicate with your team and users, 5) Document the incident for post-mortem review.
It's 2 AM. Your phone buzzes with alerts. Users are reporting errors. Your heart sinks as you realize: a critical API you depend on is down.
Whether it's Stripe processing payments, OpenAI powering your AI features, or AWS hosting your infrastructure, API outages can bring your service to its knees. But they don't have to.
This guide walks you through exactly what to do when an API goes down—from the first 5 minutes of panic to building long-term resilience.
First: Confirm It's Actually Down (Not Your Code)
Before you sound the alarm, verify the API is truly down and it's not an issue on your end.
1. Check API Status Check First
Visit apistatuscheck.com to see real-time status for 100+ popular APIs. We monitor services like:
If we're showing an outage, it's confirmed. If not, keep investigating.
2. Test from Different Networks
Your network might be the problem. Test the API from:
# Test from your server
curl -I https://api.stripe.com/v1/health
# Test from a different location using a proxy service
curl -x http://proxy-server:port -I https://api.stripe.com/v1/health
# Use an external monitoring service
# Check from multiple geographic regions
Try accessing the API from:
- Your production server
- Your local machine
- A mobile hotspot (different ISP)
- A cloud shell (Google Cloud Shell, AWS CloudShell)
3. Check Official Status Pages
Most major APIs have status pages:
- Stripe: status.stripe.com
- OpenAI: status.openai.com
- Twilio: status.twilio.com
- AWS: health.aws.amazon.com
- Anthropic: status.anthropic.com
Note: Status pages sometimes lag behind actual outages. Don't rely on them exclusively.
4. Review Your Recent Changes
Did you deploy anything recently?
# Check recent deployments
git log --since="2 hours ago" --oneline
# Review recent config changes
kubectl get events --sort-by='.lastTimestamp'
# Check environment variables
env | grep API_KEY
Common self-inflicted issues:
- ❌ Expired API keys
- ❌ Rate limit exceeded
- ❌ Changed base URL or endpoints
- ❌ Network policy blocking outbound requests
- ❌ Certificate validation errors
5. Inspect Error Responses
Read the error messages carefully:
try {
const response = await stripe.customers.create({
email: 'customer@example.com'
});
} catch (error) {
console.error('Status:', error.statusCode);
console.error('Type:', error.type);
console.error('Message:', error.message);
// 429 = You're rate limited (your fault)
// 401 = Auth issue (your fault)
// 500, 502, 503 = Server error (their fault)
// ECONNREFUSED, ETIMEDOUT = Network/DNS (investigate)
}
If it's confirmed down, proceed to immediate response.
Immediate Response Checklist (First 5 Minutes)
Time is critical. Here's your battle plan:
⏱️ Minute 1: Triage
- Confirm outage severity (partial or total?)
- Identify affected services in your application
- Check if users are impacted (review error rates, support tickets)
- Alert your on-call team
⏱️ Minute 2-3: Activate Fallbacks
- Enable cached responses if available
- Switch to backup provider (if configured)
- Queue non-critical operations for retry
- Display user-friendly error messages
⏱️ Minute 4: Communication
- Notify internal team (Slack, PagerDuty)
- Update your status page
- Prepare customer communication (if user-facing)
⏱️ Minute 5: Monitor
- Set up monitoring for API recovery
- Watch error rates and user complaints
- Document timeline and actions taken
Quick Command Center Setup
# Terminal 1: Monitor API status
watch -n 10 'curl -s https://api.stripe.com/v1/health | jq'
# Terminal 2: Watch your error logs
tail -f /var/log/app/errors.log | grep "stripe"
# Terminal 3: Monitor queue depth
watch -n 5 'redis-cli llen payment_queue'
Communication Templates
Clear communication prevents panic. Use these templates:
Internal Team Notification (Slack/Teams)
🚨 API OUTAGE ALERT
Service: Stripe API
Status: Down (confirmed on apistatuscheck.com)
Impact: Payment processing unavailable
Started: 2:03 AM PST
Affected: ~1,200 users attempting checkout
Actions Taken:
✅ Enabled payment queuing
✅ Displayed maintenance message on checkout
✅ Monitoring Stripe status page
Next Steps:
• Watch for recovery
• Process queued payments when service returns
• Brief customer support team
Incident Commander: @sarah
War Room: #incident-stripe-outage
Customer Communication (Email/In-App)
For Transactional Impact:
Subject: Temporary Payment Processing Delay
Hi [Name],
We're currently experiencing issues with our payment processor that may delay your transaction. This is not an issue with your account or payment method.
What's happening:
Our payment partner is experiencing a service disruption. We're monitoring the situation and will process your payment as soon as service is restored.
What you need to do:
Nothing! We've safely queued your payment and will automatically complete it once service returns. You'll receive a confirmation email.
We apologize for the inconvenience.
- The [Your Company] Team
[Status Page Link]
For Feature Impact:
⚠️ Some features temporarily unavailable
We're experiencing issues with [Feature Name] due to a third-party service outage. Our team is working on it, and we expect normal service to resume shortly.
You can continue using [Other Features].
[View Status Page]
Status Page Update
🟡 Degraded Performance
Investigating Payment Processing Issues
Posted: 2:05 AM PST
We're investigating reports of payment processing failures. We've confirmed our payment provider (Stripe) is experiencing an outage.
Current Status:
• Payments are being queued for processing
• All other features operating normally
• No data loss expected
We'll update this page as we learn more.
## Ready for the Next Outage?
API outages are inevitable. Your response isn't.
With proper preparation, clear communication, and robust fallback systems, you can weather any API storm with minimal impact to your users.
**Take action now:**
1. ✅ **Set up monitoring** - Create a free account at [apistatuscheck.com](https://apistatuscheck.com)
2. ✅ **Implement fallbacks** - Add circuit breakers and caching
3. ✅ **Create runbooks** - Document your response procedures
4. ✅ **Test your systems** - Run chaos engineering drills
5. ✅ **Brief your team** - Make sure everyone knows the plan
Don't wait until 2 AM to figure this out.
### Monitor 100+ APIs in Real-Time
Get instant alerts when APIs go down. Make informed decisions during outages. Keep your users happy.
**[Start Monitoring Free →](https://apistatuscheck.com/signup)**
---
*Need to check if a specific API is down right now? Visit [apistatuscheck.com](https://apistatuscheck.com) for real-time status of Stripe, OpenAI, AWS, Twilio, and 100+ more services.*
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status →