Is AWS Down? Complete Status Check Guide + Quick Fixes

EC2 instances not responding?
S3 buckets timing out?
Lambda functions failing?

Before panicking, verify if AWS is actually downβ€”or if it's a configuration issue on your end. Here's your complete guide to checking AWS status and responding to outages.

Quick Check: Is AWS Actually Down?

Don't assume it's AWS. Many "AWS down" reports are actually configuration errors, quota limits, or region-specific issues that can be resolved quickly.

1. Check Official Sources

AWS Service Health Dashboard:
πŸ”— health.aws.amazon.com/health/status

What to look for:

  • βœ… Green checkmarks = Service operational
  • ⚠️ Yellow indicators = Service degradation
  • πŸ”΄ Red indicators = Service disruption
  • πŸ“‹ Recent events = Click for details

Shows status for:

  • EC2 (Compute)
  • S3 (Storage)
  • Lambda (Serverless)
  • RDS (Databases)
  • CloudFront (CDN)
  • Route 53 (DNS)
  • All AWS regions

API Status Check:
πŸ”— apistatuscheck.com/api/aws

Why use it:

  • Real-time monitoring (checks every 5 minutes)
  • Historical uptime data
  • Instant alerts (Slack, Discord, email)
  • Tracks individual services separately
  • Multi-region monitoring

Twitter/X Search:
πŸ”— Search "AWS down" on Twitter

Why it works:

  • DevOps teams report outages instantly
  • AWS support responds here
  • See which regions affected
  • Identify specific services down

Pro tip: Search specific services: "EC2 down", "S3 down us-east-1", etc.


2. Check Region-Specific Status

AWS operates in multiple regions worldwide:

Region Code Location Common Name
us-east-1 N. Virginia US East (most common)
us-east-2 Ohio US East 2
us-west-1 N. California US West
us-west-2 Oregon US West 2
eu-west-1 Ireland Europe
eu-central-1 Frankfurt Europe Central
ap-southeast-1 Singapore Asia Pacific
ap-northeast-1 Tokyo Asia Pacific
sa-east-1 SΓ£o Paulo South America

Critical insight: AWS outages are almost always region-specific. us-east-1 can be down while us-west-2 is fine.

How to check your region:

  1. AWS Console β†’ Top-right dropdown shows current region
  2. Check your resource configurations
  3. Look at health.aws.amazon.com region-by-region

Best practice: Deploy to multiple regions for redundancy.


3. Check Service-Specific Status

AWS has 200+ services. Focus on the major ones:

Service What It Does Most Common Issues
EC2 Virtual servers Instance launch failures, connectivity
S3 Object storage High error rates, slow responses
Lambda Serverless compute Invocation failures, timeouts
RDS Managed databases Connection failures, slow queries
CloudFront CDN Cache misses, edge location issues
Route 53 DNS Resolution failures (rare)

Your service might be down while AWS globally is up.


Common AWS Error Messages (And What They Mean)

EC2: "InsufficientInstanceCapacity"

What it means: AWS doesn't have enough physical capacity in that availability zone.

Causes:

  • High demand in specific AZ
  • Instance type shortage
  • Spot instance availability

Quick fixes:

  1. Try different availability zone (us-east-1a β†’ us-east-1b)
  2. Try different instance type (m5.large β†’ m5a.large)
  3. Wait 30-60 minutes and retry
  4. Use different region temporarily

Long-term fix: Use Auto Scaling with multiple AZs.


S3: "503 Service Unavailable" or "SlowDown"

What it means: S3 is throttling requests or overloaded.

Causes:

  • Too many requests to same prefix
  • S3 service degradation
  • Regional outage

Quick fixes:

  1. Implement exponential backoff (retry with increasing delays)
  2. Check S3 request rate limits
  3. Distribute requests across key prefixes
  4. Check AWS Status for S3 issues

Code example (exponential backoff):

import time
import boto3
from botocore.exceptions import ClientError

def s3_get_with_retry(bucket, key, max_retries=5):
    s3 = boto3.client('s3')
    for i in range(max_retries):
        try:
            return s3.get_object(Bucket=bucket, Key=key)
        except ClientError as e:
            if e.response['Error']['Code'] == '503':
                wait = 2 ** i  # Exponential backoff
                time.sleep(wait)
            else:
                raise
    raise Exception("Max retries exceeded")

Lambda: "Rate Exceeded" or "TooManyRequestsException"

What it means: Hit Lambda concurrency limits.

Causes:

  • Account-level concurrent execution limit (default: 1000)
  • Reserved concurrency limit
  • Burst limit exceeded

Quick fixes:

  1. Check Lambda console β†’ Throttles metric
  2. Request concurrency limit increase (AWS Support)
  3. Implement queue (SQS) to smooth traffic
  4. Check if specific function has reserved concurrency set too low

Check current limits:

aws lambda get-account-settings --region us-east-1

RDS: "Cannot Connect to Database"

What it means: Can't reach RDS instance.

Causes:

  • Security group blocking access
  • RDS instance stopped/terminated
  • Network connectivity issue
  • Regional outage

Quick fixes:

  1. Check RDS instance status (Console β†’ RDS β†’ Databases)
  2. Verify security group allows your IP (port 3306 for MySQL, 5432 for PostgreSQL)
  3. Check VPC routing/subnet configuration
  4. Test from EC2 instance in same VPC
  5. Check AWS Status for RDS issues

Test connection from EC2:

# MySQL
mysql -h your-rds-endpoint.rds.amazonaws.com -u admin -p

# PostgreSQL
psql -h your-rds-endpoint.rds.amazonaws.com -U admin -d mydb

CloudFront: "502 Bad Gateway" or "504 Gateway Timeout"

What it means: CloudFront can't reach your origin server.

Causes:

  • Origin server down (S3, EC2, ALB)
  • Origin timeout too short
  • SSL/TLS certificate issues
  • Origin security group blocking CloudFront IPs

Quick fixes:

  1. Check origin server health
  2. Verify origin domain/IP is correct (CloudFront console)
  3. Check origin response time (should be < 30 sec)
  4. Whitelist CloudFront IP ranges in security groups
  5. Check SSL certificate validity

Get CloudFront IP ranges:

curl https://ip-ranges.amazonaws.com/ip-ranges.json | grep CLOUDFRONT

Route 53: DNS Resolution Failures

What it means: DNS queries not resolving (very rare).

Causes:

  • Hosted zone misconfigured
  • Record set errors
  • Health check failures causing failover
  • Actual Route 53 outage (extremely rare)

Quick fixes:

  1. Test DNS resolution: dig yourdomain.com or nslookup yourdomain.com
  2. Check Route 53 hosted zone records (Console β†’ Route 53)
  3. Verify nameservers match (domain registrar = Route 53 nameservers)
  4. Check health check status
  5. Check AWS Status for Route 53 issues

Test DNS from multiple locations:

# Using dig
dig @8.8.8.8 yourdomain.com

# Using nslookup
nslookup yourdomain.com 8.8.8.8

Quick Fixes: AWS Service Issues

Fix #1: Check AWS Personal Health Dashboard

First stop for AWS issues.

How to access:

  1. AWS Console β†’ Search "Health"
  2. Or visit: console.aws.amazon.com/health

What you'll see:

  • Issues affecting YOUR resources
  • Scheduled maintenance events
  • Recent events history
  • Affected resources list

Action items:

  • Read event details
  • Check "Affected resources" tab
  • Follow AWS recommendations
  • Set up email/SNS notifications

Fix #2: Verify Region Selection

Wrong region = resources "disappear"

Check current region:

  • Top-right corner of AWS Console
  • Should match where you created resources

Common mistake:

  • Created EC2 in us-east-1
  • Console switched to us-west-2
  • "Where did my instances go?!"

Fix:

  • Switch to correct region in dropdown
  • Set up AWS CLI default region:
aws configure set region us-east-1

Fix #3: Check Service Quotas/Limits

AWS has limits on everything.

Common limits:

  • EC2 instances per region (default: 20 On-Demand instances)
  • S3 bucket names (globally unique)
  • Lambda concurrent executions (default: 1000)
  • EBS volumes per region (default: 5 TiB)

Check quotas:

  1. AWS Console β†’ Service Quotas
  2. Search for service (e.g., "EC2")
  3. See current limit vs. usage
  4. Request increase if needed

Via CLI:

aws service-quotas list-service-quotas --service-code ec2

Pro tip: Request limit increases BEFORE you need them (can take 24-48 hours).


Fix #4: Implement Retry Logic with Exponential Backoff

AWS recommends exponential backoff for all API calls.

Why:

  • Handles temporary failures
  • Respects throttling
  • Improves reliability

Implementation (Python boto3):

from botocore.config import Config
import boto3

# Configure automatic retries
config = Config(
   retries = {
      'max_attempts': 10,
      'mode': 'adaptive'  # or 'standard'
   }
)

# Use with any AWS client
s3 = boto3.client('s3', config=config)
ec2 = boto3.client('ec2', config=config)

JavaScript (AWS SDK v3):

import { S3Client } from "@aws-sdk/client-s3";

const client = new S3Client({
  maxAttempts: 10,
  retryMode: "adaptive"
});

Fix #5: Check CloudWatch Metrics

CloudWatch shows what's actually happening.

Key metrics to check:

EC2:

  • CPUUtilization
  • StatusCheckFailed
  • NetworkIn/NetworkOut

S3:

  • 4xxErrors, 5xxErrors
  • AllRequests
  • BytesDownloaded

Lambda:

  • Invocations
  • Errors
  • Throttles
  • Duration

RDS:

  • CPUUtilization
  • DatabaseConnections
  • ReadLatency, WriteLatency

How to access:

  1. AWS Console β†’ CloudWatch β†’ Metrics
  2. Select namespace (AWS/EC2, AWS/S3, etc.)
  3. Graph metrics for last 1-24 hours
  4. Look for spikes/drops

CLI example:

aws cloudwatch get-metric-statistics \
  --namespace AWS/EC2 \
  --metric-name CPUUtilization \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --start-time 2026-02-07T00:00:00Z \
  --end-time 2026-02-07T23:59:59Z \
  --period 3600 \
  --statistics Average

Fix #6: Check Security Groups and NACLs

Most connectivity issues = security group misconfiguration.

Security Groups (instance-level firewall):

Check rules:

  1. EC2 Console β†’ Security Groups
  2. Find relevant group
  3. Check Inbound rules (incoming traffic)
  4. Check Outbound rules (outgoing traffic)

Common issues:

  • SSH (port 22) not allowed from your IP
  • HTTP/HTTPS (80/443) not open to 0.0.0.0/0
  • RDS port not open to application security group
  • Forgot to allow outbound traffic (rare, but happens)

Quick fix for testing:

  • Temporarily allow all traffic: 0.0.0.0/0 on all ports
  • If it works, narrow down to specific ports/IPs
  • NEVER leave wide open in production

Network ACLs (subnet-level firewall):

  • Usually left at default (allow all)
  • Check if someone modified them
  • VPC β†’ Network ACLs

Fix #7: Check IAM Permissions

"Access Denied" errors = IAM issue, not AWS down.

Troubleshoot:

1. Check who you are:

aws sts get-caller-identity

2. Test specific permission:

aws iam simulate-principal-policy \
  --policy-source-arn arn:aws:iam::123456789012:user/YourUser \
  --action-names s3:GetObject \
  --resource-arns arn:aws:s3:::your-bucket/*

3. Check CloudTrail for denied actions:

  • CloudTrail β†’ Event history
  • Filter: "Error code = AccessDenied"
  • See exactly which permission is missing

Common fixes:

  • Attach policy with required permissions
  • Add resource to existing policy
  • Check if MFA required
  • Verify you're using correct AWS account

Fix #8: Use AWS Support (If You Have a Plan)

AWS Support tiers:

Plan Response Time Cost
Basic No tech support Free
Developer 12-24 hours $29/month
Business 1 hour (critical) $100+/month
Enterprise 15 minutes (critical) $15,000+/month

When to contact support:

  • Service limits need increasing
  • Billing issues
  • Technical issues you can't resolve
  • Account or security issues

How to open case:

  1. AWS Console β†’ Support β†’ Create case
  2. Choose category (Service limit, technical, billing)
  3. Describe issue with details
  4. Attach CloudWatch graphs, error messages

Pro tip: Include AWS request IDs from error messages (speeds up troubleshooting).


EC2 Not Working?

Issue: Can't Connect to EC2 Instance

Troubleshoot:

1. Check instance state:

  • EC2 Console β†’ Instances
  • Should be "running" (green)
  • "stopped" = start it
  • "terminated" = it's gone, launch new one

2. Check security group:

  • Select instance β†’ Security tab
  • Click security group name
  • Inbound rules should include:
    • SSH (port 22) from your IP for Linux
    • RDP (port 3389) from your IP for Windows

3. Test network connectivity:

# Ping (if ICMP allowed)
ping ec2-xx-xx-xx-xx.compute.amazonaws.com

# Test SSH port
telnet ec2-xx-xx-xx-xx.compute.amazonaws.com 22
# Or
nc -zv ec2-xx-xx-xx-xx.compute.amazonaws.com 22

4. Check if you have correct key:

  • SSH requires .pem key file
  • Key must match what you selected at launch
  • Key permissions must be 400: chmod 400 your-key.pem

5. Check System Status Checks:

  • EC2 Console β†’ Instance β†’ Status checks tab
  • "2/2 checks passed" = healthy
  • Failed checks = hardware/network issue β†’ Reboot or contact AWS

Issue: EC2 Instance Slow or Unresponsive

Causes:

  • CPU throttling (T instance credits exhausted)
  • Memory exhausted
  • Disk I/O bottleneck
  • Network saturation

Troubleshoot:

1. Check CloudWatch metrics:

  • CPU, Network, Disk I/O graphs
  • Look for maxed out metrics

2. For T instances (T2, T3, T4g), check CPU credits:

  • CloudWatch β†’ Metrics β†’ EC2 β†’ Per-Instance Metrics
  • CPUCreditBalance
  • If near zero, you're being throttled

Solutions:

  • Switch to unlimited mode (costs more but no throttling)
  • Upgrade to M, C, or R instance type
  • Optimize application

3. Connect via EC2 Instance Connect or Session Manager:

  • Browser-based console access (no SSH needed)
  • EC2 Console β†’ Instance β†’ Connect button

S3 Not Working?

Issue: S3 Bucket Access Denied

Causes:

  • Bucket policy blocking access
  • IAM permissions missing
  • Bucket in different region
  • Bucket doesn't exist

Troubleshoot:

1. Check bucket exists:

aws s3 ls s3://your-bucket-name

2. Check bucket region:

aws s3api get-bucket-location --bucket your-bucket-name

3. Check bucket policy:

  • S3 Console β†’ Bucket β†’ Permissions β†’ Bucket policy
  • Look for "Deny" statements

4. Check IAM permissions:

  • Need s3:GetObject, s3:PutObject, s3:ListBucket, etc.

5. Check Block Public Access settings:

  • S3 Console β†’ Bucket β†’ Permissions β†’ Block public access
  • May need to disable for public buckets

Issue: S3 High Error Rates

Check Service Health Dashboard:

Implement retry logic:

  • See Fix #4 above

Optimize request patterns:

  • Distribute across key prefixes (avoid sequential keys)
  • Use CloudFront for frequently accessed objects
  • Enable S3 Transfer Acceleration for uploads

Lambda Not Working?

Issue: Lambda Timeouts

Causes:

  • Function timeout too short (default: 3 sec, max: 15 min)
  • Slow dependencies (database, API calls)
  • Cold starts
  • VPC networking delays

Quick fixes:

1. Increase timeout:

  • Lambda Console β†’ Function β†’ Configuration β†’ General
  • Set timeout higher (but find root cause)

2. Check CloudWatch Logs:

  • Lambda Console β†’ Function β†’ Monitor β†’ View logs in CloudWatch
  • See exactly where function is slow

3. Optimize function:

  • Reduce package size
  • Increase memory (also increases CPU)
  • Remove VPC if not needed (VPC adds latency)
  • Use provisioned concurrency for critical functions

Issue: Lambda "Function Not Found"

Causes:

  • Function in wrong region
  • Function deleted
  • Wrong function name

Quick fixes:

  1. Check region (top-right dropdown)
  2. List functions: aws lambda list-functions
  3. Verify function ARN

When AWS Actually Goes Down

What Happens

Major AWS outages (recent):

  • December 2021: us-east-1 outage (7 hours) - networking issue
  • July 2022: us-east-1 power issue (2 hours)
  • June 2023: us-east-1 EC2 API issues (3 hours)

Typical causes:

  1. Power issues at data centers
  2. Networking failures
  3. Software deployment bugs
  4. Rare: DDoS attacks

Impact:

  • Regional (usually just one region)
  • Service-specific (EC2 down, but S3 works)
  • Cascading failures (one service depends on another)

How AWS Responds

Communication:

Timeline:

  1. 0-15 min: Users report issues on Twitter
  2. 15-30 min: AWS acknowledges on dashboard
  3. 30-90 min: Regular updates
  4. Resolution: Hours to days for major outages
  5. Post-mortem: Detailed PIR published weeks later

What to Do During Outages

1. Activate failover (if configured):

  • Switch to different region
  • Use read replicas for databases
  • Activate standby resources

2. Monitor Personal Health Dashboard:

  • Shows YOUR affected resources
  • Provides specific guidance

3. Communicate with stakeholders:

  • Update status page
  • Notify customers
  • Set expectations

4. Document incident:

  • Screenshot error messages
  • Save CloudWatch graphs
  • Note timeline
  • Use for post-mortem

5. Consider SLA credits:

  • AWS SLA: 99.99% uptime for most services
  • If missed, request service credits
  • Submit within 30 days of incident

AWS Down Checklist

Follow these steps in order:

Step 1: Verify it's actually AWS

Step 2: Service-specific checks

  • EC2: Check instance status, security groups
  • S3: Test bucket access, check error rates
  • Lambda: Check CloudWatch logs, metrics
  • RDS: Test connection, check instance status
  • CloudFront: Check origin health
  • Route 53: Test DNS resolution

Step 3: Configuration troubleshooting

  • Check security groups/NACLs
  • Verify IAM permissions
  • Check service quotas/limits
  • Review CloudWatch metrics
  • Check CloudTrail for errors

Step 4: Implement workarounds

  • Add retry logic with exponential backoff
  • Failover to different region (if multi-region)
  • Use alternate service (e.g., S3 β†’ CloudFront)
  • Scale resources if capacity issue

Step 5: Contact AWS (if needed)

  • Open AWS Support case
  • Include request IDs, error messages
  • Attach CloudWatch graphs
  • Escalate if critical

Prevent Future Issues

1. Design for Failure

AWS Best Practices:

Multi-AZ deployment:

Single AZ = single point of failure
Multi-AZ = survives data center failure

Multi-Region for critical workloads:

  • Active-active or active-passive
  • Route 53 health checks + failover
  • Cross-region replication (S3, RDS, DynamoDB)

Example architecture:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   us-east-1     β”‚         β”‚   us-west-2     β”‚
β”‚  (Primary)      │◄─────────   (Backup)      β”‚
β”‚                 β”‚         β”‚                 β”‚
β”‚  EC2 Auto Scale β”‚         β”‚  EC2 Auto Scale β”‚
β”‚  RDS Multi-AZ   β”‚         β”‚  RDS Read Rep   β”‚
β”‚  S3 (CRRβ†’)      β”‚         β”‚  S3 (←CRR)      β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
         β–²                           β–²
         β”‚                           β”‚
    Route 53 (health check + failover)

2. Implement Monitoring and Alerts

CloudWatch Alarms:

Critical alarms to set up:

  • EC2 StatusCheckFailed
  • RDS DatabaseConnections > threshold
  • Lambda Errors > threshold
  • S3 4xxErrors or 5xxErrors spike
  • ALB TargetResponseTime > threshold

Example alarm (CLI):

aws cloudwatch put-metric-alarm \
  --alarm-name ec2-cpu-high \
  --alarm-description "Alert if CPU exceeds 80%" \
  --metric-name CPUUtilization \
  --namespace AWS/EC2 \
  --statistic Average \
  --period 300 \
  --threshold 80 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=InstanceId,Value=i-1234567890abcdef0 \
  --evaluation-periods 2 \
  --alarm-actions arn:aws:sns:us-east-1:123456789012:my-topic

Third-party monitoring:

  • API Status Check - External monitoring
  • Datadog, New Relic, Dynatrace - APM
  • PagerDuty - Incident management

3. Use AWS Health API

Automate health check monitoring:

import boto3

health = boto3.client('health', region_name='us-east-1')

# Get all open issues
events = health.describe_events(
    filter={
        'eventStatusCodes': ['open', 'upcoming']
    }
)

for event in events['events']:
    print(f"Service: {event['service']}")
    print(f"Region: {event.get('region', 'GLOBAL')}")
    print(f"Status: {event['eventStatusCode']}")
    print(f"Description: {event['eventTypeCode']}")

Set up SNS notifications:

  • Personal Health Dashboard β†’ Preferences
  • Configure email/SMS for events

4. Regular DR Drills

Disaster Recovery testing:

Quarterly exercises:

  1. Simulate region failure
  2. Failover to backup region
  3. Test recovery time
  4. Document issues found
  5. Update runbooks

GameDay exercises:

  • AWS hosts GameDay events
  • Simulate real outage scenarios
  • Practice incident response
  • Improve team coordination

5. Keep Service Quotas Ahead

Proactive limit increases:

Before Black Friday, product launches, etc.:

  1. Review current usage
  2. Project peak demand
  3. Request quota increases 2-4 weeks early
  4. Confirm increases before event

Auto-scaling quotas:

  • Make sure auto-scaling limits match instance quotas
  • Request limits 2x peak demand (headroom)

Key Takeaways

Before assuming AWS is down:

  1. βœ… Check AWS Service Health Dashboard
  2. βœ… Check Personal Health Dashboard
  3. βœ… Verify correct region selected
  4. βœ… Search Twitter for "AWS down [region]"
  5. βœ… Test specific service (EC2, S3, Lambda, etc.)

Common fixes:

  • Check security groups (most connectivity issues)
  • Verify IAM permissions (most access denied errors)
  • Check service quotas (hit limits)
  • Implement retry logic with exponential backoff
  • Review CloudWatch metrics and logs

Service-specific issues:

  • EC2: Security groups, status checks, instance capacity
  • S3: Bucket policies, retry logic, key distribution
  • Lambda: Timeouts, concurrency limits, CloudWatch logs
  • RDS: Security groups, connection limits, Multi-AZ
  • CloudFront: Origin health, SSL certificates
  • Route 53: DNS records, health checks (rarely down)

If AWS is actually down:

  • Monitor Health Dashboard for updates
  • Activate failover to different region (if configured)
  • Communicate with stakeholders
  • Document incident for post-mortem
  • Consider requesting SLA credits

Prevent future issues:

  • Design multi-AZ/multi-region architecture
  • Set up CloudWatch alarms
  • Use Personal Health Dashboard API
  • Practice DR drills quarterly
  • Request service quota increases proactively

Remember: Most AWS issues are configuration errors or hitting limits, not actual AWS outages. Check security groups, IAM permissions, and quotas first.


Need real-time AWS status monitoring? Track AWS uptime with API Status Check - Get instant alerts when AWS services go down.


Related Resources

Monitor Your APIs

Check the real-time status of 100+ popular APIs used by developers.

View API Status β†’