How to Monitor AWS API Status and Outages (2026 Guide)
๐ก Monitor your APIs โ know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link โ we may earn a commission at no extra cost to you
How to Monitor AWS API Status and Outages (2026 Guide)
AWS powers roughly a third of the internet. When AWS has issues, millions of applications are affected โ from Netflix to your startup's API. The challenge? AWS has 200+ services across 30+ regions, and knowing which service is degraded in which region requires real monitoring, not just checking a status page.
This guide covers every method to monitor AWS service health, from free dashboards to automated alerting.
Method 1: API Status Check (Unified Dashboard)
API Status Check aggregates AWS service status alongside 120+ other APIs your application depends on.
What you get:
- Real-time AWS status monitoring
- Alerts when AWS services report degradation
- Single dashboard for AWS + Stripe + OpenAI + GitHub + everything else
- No setup โ AWS is pre-configured
Why this matters: Most applications don't just depend on AWS. They depend on AWS and Stripe and Twilio and OpenAI. API Status Check gives you one view of all your external dependencies.
Pricing: Free (3 APIs) | $9/mo Alert Pro (10 APIs) | $29/mo Team (30 APIs)
Start monitoring AWS and 120+ APIs free โ
Method 2: AWS Health Dashboard (Built-in)
AWS provides two health dashboards:
AWS Service Health Dashboard
Shows the current status of all AWS services across all regions. This is the public dashboard โ no AWS account needed.
Limitations:
- Shows global status, not your-account-specific issues
- Updates can be delayed (AWS must acknowledge the issue)
- Doesn't reflect regional performance variations
- No automated alerting (manual page checking only)
AWS Personal Health Dashboard (PHD)
Available in the AWS Console under Health Dashboard.
What it provides:
- Account-specific health events
- Scheduled maintenance notifications
- Proactive recommendations
- Events that affect your specific resources
How to set up alerts:
- Go to AWS Console โ Health Dashboard
- Click Event log for historical issues
- Use Amazon EventBridge to route health events:
{
"source": ["aws.health"],
"detail-type": ["AWS Health Event"],
"detail": {
"service": ["EC2", "S3", "LAMBDA", "RDS"],
"eventTypeCategory": ["issue", "scheduledChange"]
}
}
Route EventBridge to SNS โ Slack/Email/PagerDuty for automated alerting.
Best for: Teams running significant AWS infrastructure who want account-specific health events.
Method 3: CloudWatch Alarms
For monitoring your AWS resources' actual performance (not just AWS's reported status):
Key Metrics to Monitor
# EC2 Instance Health
aws cloudwatch put-metric-alarm \
--alarm-name "High-CPU-Production" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
# RDS Connection Issues
aws cloudwatch put-metric-alarm \
--alarm-name "RDS-High-Connections" \
--metric-name DatabaseConnections \
--namespace AWS/RDS \
--statistic Maximum \
--period 60 \
--threshold 90 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
# Lambda Errors
aws cloudwatch put-metric-alarm \
--alarm-name "Lambda-Error-Rate" \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
Essential CloudWatch Dashboards
Create a CloudWatch dashboard covering:
| Service | Key Metrics |
|---|---|
| EC2 | CPUUtilization, NetworkIn/Out, StatusCheckFailed |
| RDS | CPUUtilization, FreeableMemory, DatabaseConnections, ReadLatency |
| Lambda | Invocations, Errors, Duration, Throttles, ConcurrentExecutions |
| S3 | 4xxErrors, 5xxErrors, FirstByteLatency |
| API Gateway | 4XXError, 5XXError, Latency, Count |
| ALB | TargetResponseTime, HTTPCode_Target_5XX_Count |
Best for: Deep infrastructure monitoring of your specific AWS resources.
Method 4: Third-Party Monitoring Tools
Datadog AWS Integration
Datadog https://www.datadoghq.com/ provides the deepest third-party AWS monitoring:
- 100+ AWS service integrations
- CloudTrail log analysis
- Real-time infrastructure maps
- Correlation between AWS metrics and application performance
- Custom dashboards for multi-service visibility
Better Stack
Better Stack can monitor AWS endpoints:
- HTTP checks against your AWS-hosted services
- Alert when your load balancer or API returns errors
- Log management for AWS services (via Fluentd/CloudWatch Logs forwarding)
New Relic AWS Integration
New Relic https://newrelic.com/ offers:
- AWS CloudWatch Metric Streams integration
- Infrastructure agent for EC2 monitoring
- Lambda monitoring with distributed tracing
- 100GB/month free data ingest
Building AWS Resilience
1. Multi-Region Architecture
Don't put all your eggs in us-east-1:
Primary: us-east-1
โโโ Application servers (EC2/ECS)
โโโ Database (RDS Multi-AZ)
โโโ Cache (ElastiCache)
โโโ Storage (S3)
Failover: us-west-2
โโโ Read replicas (RDS)
โโโ Static assets (S3 cross-region replication)
โโโ DNS failover (Route 53 health checks)
2. Circuit Breaker Pattern
When an AWS service degrades, stop hammering it:
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Service down, reject requests
HALF_OPEN = "half_open" # Testing if service recovered
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.state = CircuitState.CLOSED
self.failures = 0
self.threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = 0
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN โ AWS service degraded")
try:
result = func(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.CLOSED
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.threshold:
self.state = CircuitState.OPEN
raise
# Usage
s3_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=120)
try:
result = s3_breaker.call(s3_client.get_object, Bucket="my-bucket", Key="data.json")
except Exception:
# Fall back to local cache or alternative storage
result = get_from_local_cache("data.json")
3. Graceful Degradation
Map AWS service failures to user-facing responses:
| AWS Service Down | User Impact | Graceful Response |
|---|---|---|
| S3 | Images/files unavailable | Show placeholders, serve from CDN cache |
| RDS | Database queries fail | Serve cached data, show "limited functionality" |
| Lambda | Background jobs fail | Queue for retry, show "processing delayed" |
| SES | Emails don't send | Queue emails, show "confirmation coming soon" |
| API Gateway | API endpoints 502 | Route to backup endpoint or static response |
Monitoring Checklist for AWS
- External status โ API Status Check for AWS service alerts
- Account health โ AWS Personal Health Dashboard + EventBridge alerts
- Resource metrics โ CloudWatch alarms for CPU, memory, errors, latency
- Application monitoring โ Datadog/New Relic for end-to-end visibility
- Multi-region โ Health checks and failover configured in Route 53
- Cost monitoring โ AWS Budgets alerts for unexpected spend spikes
- Incident response โ Runbook for common AWS failure scenarios
Frequently Asked Questions
How often does AWS go down?
AWS has excellent overall availability, but individual service incidents occur regularly โ typically 10-20 notable incidents per year across all services and regions. Major outages affecting multiple services are rare (1-2 per year) but impactful. Most incidents are region-specific and service-specific.
Is the AWS Status Page accurate?
The public status page (health.aws.amazon.com) is often delayed โ AWS sometimes takes 15-30 minutes to acknowledge issues. The Personal Health Dashboard in your AWS Console is faster and account-specific. For the fastest signal, use API Status Check alongside AWS's own tools.
Should I monitor AWS if I use a PaaS like Vercel or Heroku?
Yes. Vercel runs on AWS (us-east-1 primarily). Heroku runs on AWS. When AWS has issues, your PaaS is affected. Monitoring AWS gives you early warning that your Vercel/Heroku deployment may be impacted, even if those platforms haven't acknowledged the issue yet.
What's the most common AWS failure mode?
Regional service degradation โ a single service in a single region experiencing elevated error rates or increased latency. This is more common than full outages and harder to detect without monitoring. The most frequently affected services are us-east-1 (most traffic) and services like Lambda, S3, and DynamoDB during peak load.
How do I get AWS outage alerts without CloudWatch?
API Status Check monitors AWS service status and sends email alerts โ no AWS account configuration needed. Subscribe to AWS's status page RSS feed for another source. For the fastest alerts, layer multiple sources.
Summary: Recommended AWS Monitoring Stack
| Layer | Tool | Cost | What It Catches |
|---|---|---|---|
| External status | API Status Check | Free-$9/mo | AWS service-level issues |
| Account health | AWS Personal Health Dashboard | Free | Your-account-specific events |
| Resource monitoring | CloudWatch | Pay-per-use | Your infrastructure metrics |
| Application monitoring | Datadog https://www.datadoghq.com/ or New Relic https://newrelic.com/ |
$15+/mo | End-to-end performance |
| Endpoint monitoring | Better Stack betterstack.com | Free-$29/mo | Your service availability |
Layer these for complete coverage. No single tool catches everything.
Start monitoring AWS today โ API Status Check takes 30 seconds to set up and covers AWS plus 120+ other APIs. Free to start.
Some links on this page are affiliate links. We may earn a commission if you make a purchase through these links, at no additional cost to you.
๐ Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
โWe use SEMrush to track how our API status pages rank and catch site health issues early.โ
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
14-day free trial ยท $0 due today ยท $9/mo after ยท Cancel anytime
Browse Free Dashboard โ