How to Monitor AWS API Status and Outages (2026 Guide)
How to Monitor AWS API Status and Outages (2026 Guide)
AWS powers roughly a third of the internet. When AWS has issues, millions of applications are affected — from Netflix to your startup's API. The challenge? AWS has 200+ services across 30+ regions, and knowing which service is degraded in which region requires real monitoring, not just checking a status page.
This guide covers every method to monitor AWS service health, from free dashboards to automated alerting.
Method 1: API Status Check (Unified Dashboard)
API Status Check aggregates AWS service status alongside 120+ other APIs your application depends on.
What you get:
- Real-time AWS status monitoring
- Alerts when AWS services report degradation
- Single dashboard for AWS + Stripe + OpenAI + GitHub + everything else
- No setup — AWS is pre-configured
Why this matters: Most applications don't just depend on AWS. They depend on AWS and Stripe and Twilio and OpenAI. API Status Check gives you one view of all your external dependencies.
Pricing: Free (3 APIs) | $9/mo Alert Pro (10 APIs) | $29/mo Team (30 APIs)
Start monitoring AWS and 120+ APIs free →
Method 2: AWS Health Dashboard (Built-in)
AWS provides two health dashboards:
AWS Service Health Dashboard
Shows the current status of all AWS services across all regions. This is the public dashboard — no AWS account needed.
Limitations:
- Shows global status, not your-account-specific issues
- Updates can be delayed (AWS must acknowledge the issue)
- Doesn't reflect regional performance variations
- No automated alerting (manual page checking only)
AWS Personal Health Dashboard (PHD)
Available in the AWS Console under Health Dashboard.
What it provides:
- Account-specific health events
- Scheduled maintenance notifications
- Proactive recommendations
- Events that affect your specific resources
How to set up alerts:
- Go to AWS Console → Health Dashboard
- Click Event log for historical issues
- Use Amazon EventBridge to route health events:
{
"source": ["aws.health"],
"detail-type": ["AWS Health Event"],
"detail": {
"service": ["EC2", "S3", "LAMBDA", "RDS"],
"eventTypeCategory": ["issue", "scheduledChange"]
}
}
Route EventBridge to SNS → Slack/Email/PagerDuty for automated alerting.
Best for: Teams running significant AWS infrastructure who want account-specific health events.
Method 3: CloudWatch Alarms
For monitoring your AWS resources' actual performance (not just AWS's reported status):
Key Metrics to Monitor
# EC2 Instance Health
aws cloudwatch put-metric-alarm \
--alarm-name "High-CPU-Production" \
--metric-name CPUUtilization \
--namespace AWS/EC2 \
--statistic Average \
--period 300 \
--threshold 80 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 2 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
# RDS Connection Issues
aws cloudwatch put-metric-alarm \
--alarm-name "RDS-High-Connections" \
--metric-name DatabaseConnections \
--namespace AWS/RDS \
--statistic Maximum \
--period 60 \
--threshold 90 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 3 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
# Lambda Errors
aws cloudwatch put-metric-alarm \
--alarm-name "Lambda-Error-Rate" \
--metric-name Errors \
--namespace AWS/Lambda \
--statistic Sum \
--period 300 \
--threshold 10 \
--comparison-operator GreaterThanThreshold \
--evaluation-periods 1 \
--alarm-actions arn:aws:sns:us-east-1:ACCOUNT:alerts
Essential CloudWatch Dashboards
Create a CloudWatch dashboard covering:
| Service | Key Metrics |
|---|---|
| EC2 | CPUUtilization, NetworkIn/Out, StatusCheckFailed |
| RDS | CPUUtilization, FreeableMemory, DatabaseConnections, ReadLatency |
| Lambda | Invocations, Errors, Duration, Throttles, ConcurrentExecutions |
| S3 | 4xxErrors, 5xxErrors, FirstByteLatency |
| API Gateway | 4XXError, 5XXError, Latency, Count |
| ALB | TargetResponseTime, HTTPCode_Target_5XX_Count |
Best for: Deep infrastructure monitoring of your specific AWS resources.
Method 4: Third-Party Monitoring Tools
Datadog AWS Integration
Datadog [AFFILIATE:datadog] provides the deepest third-party AWS monitoring:
- 100+ AWS service integrations
- CloudTrail log analysis
- Real-time infrastructure maps
- Correlation between AWS metrics and application performance
- Custom dashboards for multi-service visibility
Better Stack
Better Stack [AFFILIATE:betterstack] can monitor AWS endpoints:
- HTTP checks against your AWS-hosted services
- Alert when your load balancer or API returns errors
- Log management for AWS services (via Fluentd/CloudWatch Logs forwarding)
New Relic AWS Integration
New Relic [AFFILIATE:newrelic] offers:
- AWS CloudWatch Metric Streams integration
- Infrastructure agent for EC2 monitoring
- Lambda monitoring with distributed tracing
- 100GB/month free data ingest
Building AWS Resilience
1. Multi-Region Architecture
Don't put all your eggs in us-east-1:
Primary: us-east-1
├── Application servers (EC2/ECS)
├── Database (RDS Multi-AZ)
├── Cache (ElastiCache)
└── Storage (S3)
Failover: us-west-2
├── Read replicas (RDS)
├── Static assets (S3 cross-region replication)
└── DNS failover (Route 53 health checks)
2. Circuit Breaker Pattern
When an AWS service degrades, stop hammering it:
import time
from enum import Enum
class CircuitState(Enum):
CLOSED = "closed" # Normal operation
OPEN = "open" # Service down, reject requests
HALF_OPEN = "half_open" # Testing if service recovered
class CircuitBreaker:
def __init__(self, failure_threshold=5, recovery_timeout=60):
self.state = CircuitState.CLOSED
self.failures = 0
self.threshold = failure_threshold
self.recovery_timeout = recovery_timeout
self.last_failure_time = 0
def call(self, func, *args, **kwargs):
if self.state == CircuitState.OPEN:
if time.time() - self.last_failure_time > self.recovery_timeout:
self.state = CircuitState.HALF_OPEN
else:
raise Exception("Circuit breaker is OPEN — AWS service degraded")
try:
result = func(*args, **kwargs)
if self.state == CircuitState.HALF_OPEN:
self.state = CircuitState.CLOSED
self.failures = 0
return result
except Exception as e:
self.failures += 1
self.last_failure_time = time.time()
if self.failures >= self.threshold:
self.state = CircuitState.OPEN
raise
# Usage
s3_breaker = CircuitBreaker(failure_threshold=3, recovery_timeout=120)
try:
result = s3_breaker.call(s3_client.get_object, Bucket="my-bucket", Key="data.json")
except Exception:
# Fall back to local cache or alternative storage
result = get_from_local_cache("data.json")
3. Graceful Degradation
Map AWS service failures to user-facing responses:
| AWS Service Down | User Impact | Graceful Response |
|---|---|---|
| S3 | Images/files unavailable | Show placeholders, serve from CDN cache |
| RDS | Database queries fail | Serve cached data, show "limited functionality" |
| Lambda | Background jobs fail | Queue for retry, show "processing delayed" |
| SES | Emails don't send | Queue emails, show "confirmation coming soon" |
| API Gateway | API endpoints 502 | Route to backup endpoint or static response |
Monitoring Checklist for AWS
- External status — API Status Check for AWS service alerts
- Account health — AWS Personal Health Dashboard + EventBridge alerts
- Resource metrics — CloudWatch alarms for CPU, memory, errors, latency
- Application monitoring — Datadog/New Relic for end-to-end visibility
- Multi-region — Health checks and failover configured in Route 53
- Cost monitoring — AWS Budgets alerts for unexpected spend spikes
- Incident response — Runbook for common AWS failure scenarios
Frequently Asked Questions
How often does AWS go down?
AWS has excellent overall availability, but individual service incidents occur regularly — typically 10-20 notable incidents per year across all services and regions. Major outages affecting multiple services are rare (1-2 per year) but impactful. Most incidents are region-specific and service-specific.
Is the AWS Status Page accurate?
The public status page (health.aws.amazon.com) is often delayed — AWS sometimes takes 15-30 minutes to acknowledge issues. The Personal Health Dashboard in your AWS Console is faster and account-specific. For the fastest signal, use API Status Check alongside AWS's own tools.
Should I monitor AWS if I use a PaaS like Vercel or Heroku?
Yes. Vercel runs on AWS (us-east-1 primarily). Heroku runs on AWS. When AWS has issues, your PaaS is affected. Monitoring AWS gives you early warning that your Vercel/Heroku deployment may be impacted, even if those platforms haven't acknowledged the issue yet.
What's the most common AWS failure mode?
Regional service degradation — a single service in a single region experiencing elevated error rates or increased latency. This is more common than full outages and harder to detect without monitoring. The most frequently affected services are us-east-1 (most traffic) and services like Lambda, S3, and DynamoDB during peak load.
How do I get AWS outage alerts without CloudWatch?
API Status Check monitors AWS service status and sends email alerts — no AWS account configuration needed. Subscribe to AWS's status page RSS feed for another source. For the fastest alerts, layer multiple sources.
Summary: Recommended AWS Monitoring Stack
| Layer | Tool | Cost | What It Catches |
|---|---|---|---|
| External status | API Status Check | Free-$9/mo | AWS service-level issues |
| Account health | AWS Personal Health Dashboard | Free | Your-account-specific events |
| Resource monitoring | CloudWatch | Pay-per-use | Your infrastructure metrics |
| Application monitoring | Datadog [AFFILIATE:datadog] or New Relic [AFFILIATE:newrelic] |
$15+/mo | End-to-end performance |
| Endpoint monitoring | Better Stack betterstack.com | Free-$29/mo | Your service availability |
Layer these for complete coverage. No single tool catches everything.
Start monitoring AWS today — API Status Check takes 30 seconds to set up and covers AWS plus 120+ other APIs. Free to start.
Some links on this page are affiliate links. We may earn a commission if you make a purchase through these links, at no additional cost to you.
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status →