Is Azure Down? Complete Status Check Guide + Quick Fixes
Azure Portal not loading?
VMs unresponsive?
App Service deployment failing?
Before panicking, verify if Azure is actually downβor if it's a configuration issue, quota problem, or regional outage. Here's your complete guide to checking Azure status and fixing common cloud infrastructure issues.
Quick Check: Is Azure Actually Down?
Don't assume it's Azure. 50% of "Azure down" reports are actually configuration errors, quota limits, or subscription issuesβnot platform outages.
1. Check Official Sources
Azure Status Page:
π status.azure.com
What to look for:
- β "No current issues" = Azure is fine
- β οΈ "Active event" = Some services/regions affected
- π΄ "Outage" = Azure is down
Real-time updates:
- Azure Portal availability
- Virtual Machines status
- App Service health
- Azure Active Directory (Entra ID)
- Storage Accounts
- Azure Functions
- Regional outages
- Service-specific incidents
Pro tip: Filter by service and region you're using.
API Status Check:
π apistatuscheck.com/api/azure
Why use it:
- Real-time monitoring (checks every 5 minutes)
- Historical uptime data
- Instant alerts (Slack, Discord, email)
- Tracks Portal, VMs, App Service separately
- Third-party verification
Twitter/X Search:
π Search "Azure down" on Twitter
Why it works:
- Users report issues instantly
- See if others experiencing same problem
- Regional patterns emerge
- Microsoft responds here: @Azure, @AzureSupport
Pro tip: If 1,000+ tweets in last hour mention "Azure down," it's likely a real outage.
DownDetector:
π downdetector.com/status/windows-azure
Shows:
- Real-time user reports
- Heatmap of affected areas
- Most reported problems (portal, VMs, storage)
2. Check Service-Specific Status
Azure has 200+ services that can fail independently:
| Service | What It Does | Common Issues |
|---|---|---|
| Azure Portal | Web management interface | Portal not loading, timeouts |
| Virtual Machines | IaaS compute | VM not starting, connectivity lost |
| App Service | PaaS web hosting | Deployment fails, apps down |
| Azure AD (Entra ID) | Identity/authentication | Login failures, token errors |
| Azure Storage | Blob/file/queue storage | Upload fails, access denied |
| Azure Functions | Serverless compute | Function not triggering, timeouts |
| Azure SQL | Managed databases | Connection failures, performance |
| Azure DevOps | CI/CD platform | Pipeline failures, repo access |
Your service might be down while Azure globally is up.
3. Check Regional Status
Azure has 60+ regions worldwide. Outages are often regional.
Check your region:
- Go to status.azure.com
- Filter by your region (e.g., "East US", "West Europe")
- See if active incidents in your region
Find your resource region:
- Azure Portal β Your resource β Overview β Location
Multi-region strategy:
- If East US is down, try deploying to West US temporarily
- Production apps should span multiple regions
4. Test Different Access Methods
If Azure Portal works but Azure CLI doesn't, it's likely tool-specific.
| Platform | Test Method |
|---|---|
| Azure Portal | portal.azure.com |
| Azure CLI | az login && az account show |
| Azure PowerShell | Connect-AzAccount |
| Azure Mobile App | Launch Azure app (iOS/Android) |
Decision tree:
Portal works + CLI fails β CLI auth/config issue
Portal fails + CLI works β Browser/network issue
Nothing works β Azure likely down (or subscription issue)
Common Azure Error Messages (And What They Mean)
"This site can't be reached" (Azure Portal)
What it means: Can't connect to portal.azure.com.
Causes:
- Internet connection issue
- Firewall blocking Azure domains
- DNS resolution failure
- Rare: Azure Portal outage
Quick fixes:
- Test internet connection (visit google.com)
- Check DNS:
nslookup portal.azure.com - Try different browser
- Try incognito/private mode
- Disable VPN temporarily (test)
- Check firewall settings
- Try Azure CLI (bypass portal entirely)
For corporate networks:
- Whitelist
*.azure.com,*.microsoft.com - Check proxy configuration
- Contact IT admin
"Subscription not found" or "No subscriptions found"
What it means: Can't access Azure subscription.
Causes:
- Signed in with wrong account
- Subscription expired/disabled
- No subscriptions associated with account
- Permissions revoked
Quick fixes:
- Verify signed-in account: Portal β Profile icon β Check email
- Switch directory: Portal β Settings β Directories + subscriptions
- Check subscription status: Account portal
- Verify payment method current (credit card not expired)
- Contact subscription admin (may need access granted)
Check subscription status via CLI:
az account list --output table
az account show
"The subscription is disabled and therefore marked as read only"
What it means: Subscription suspended.
Causes:
- Payment method failed
- Spending limit reached
- Trial expired
- Credit card expired
- Account under review
Quick fixes:
- Go to Azure Account Center
- Update payment method
- Check for outstanding invoices
- Remove spending limit (if applicable)
- Contact Azure Support (may be fraud hold)
For free trial:
- Trial typically 30 days or $200 credit
- Must upgrade to pay-as-you-go to continue
"Quota exceeded" or "Operation could not be completed as it results in exceeding quota limits"
What it means: Hit subscription or regional quota limit.
Causes:
- Too many VMs in region
- Too many cores requested
- Too many storage accounts
- Public IP address limit reached
Quick fixes:
1. Check current quota:
- Portal β Subscriptions β Your subscription β Usage + quotas
- Filter by region and service
2. Request quota increase:
- Portal β Help + support β New support request
- Issue type: Service and subscription limits (quotas)
- Provide justification and desired limit
3. Clean up unused resources:
# List all VMs
az vm list --output table
# Delete unused VM
az vm delete --name MyVM --resource-group MyRG --yes
# List unused disks
az disk list --query "[?managedBy==null]" --output table
4. Use different region:
- Some regions have higher limits
- Try deploying to less-congested region
Common quotas:
- Standard VMs: 10-20 per region (default)
- vCPUs: 10-20 per region (default)
- Storage accounts: 250 per region
- Public IPs: 10-20 per region
"Allocation failed" (Virtual Machines)
What it means: Azure can't allocate hardware for your VM.
Causes:
- Datacenter capacity constraints
- Specific VM size unavailable in region
- Availability zone full
- Hardware generation not available
Quick fixes:
1. Try different region:
# Check VM size availability in regions
az vm list-sizes --location eastus --output table
az vm list-sizes --location westus --output table
2. Try different VM size:
- Use similar size (e.g., D2s_v3 instead of D2_v3)
- Older generation may have availability
3. Stop and redeploy VM:
- Stop (deallocate) VM
- Wait a few minutes
- Start VM again (may allocate to different hardware)
az vm deallocate --name MyVM --resource-group MyRG
az vm start --name MyVM --resource-group MyRG
4. Create new VM in availability set:
- Provides better allocation guarantees
5. Contact Azure Support:
- For critical workloads, support can help with allocation
"Authentication failed" or "AADSTS" errors (Azure AD/Entra ID)
What it means: Can't authenticate to Azure AD.
Causes:
- Password incorrect
- MFA issue
- Conditional access blocking
- Token expired
- Service principal credentials invalid
Quick fixes:
1. Verify credentials:
- Double-check username/password
- Try signing in to portal.azure.com directly
2. Clear token cache (Azure CLI):
az account clear
az login
3. Check MFA:
- Complete MFA challenge
- Verify authentication app working (Microsoft Authenticator)
4. Service principal authentication:
# Test service principal
az login --service-principal \
--username <app-id> \
--password <password-or-cert> \
--tenant <tenant-id>
5. Review conditional access policies:
- Portal β Azure AD β Security β Conditional Access
- May be blocking from certain locations/devices
Common AADSTS error codes:
- AADSTS50126: Invalid username or password
- AADSTS50076: MFA required
- AADSTS50053: Account locked
- AADSTS700016: Application not found in directory
"ResourceNotFound" or "NotFound" (404 errors)
What it means: Resource doesn't exist.
Causes:
- Resource was deleted
- Wrong resource group/subscription
- Wrong region
- Typo in resource name
Quick fixes:
1. Verify resource exists:
# List all resources in subscription
az resource list --output table
# Search for specific resource
az resource list --query "[?name=='MyResource']"
# Check specific resource group
az resource list --resource-group MyRG --output table
2. Check subscription context:
# Show current subscription
az account show
# List all subscriptions
az account list --output table
# Switch subscription
az account set --subscription "My Subscription"
3. Check resource group:
- Resource may be in different RG than expected
- Portal β Resource groups β Browse all
"StorageAccountAlreadyTaken"
What it means: Storage account name already in use.
Causes:
- Storage account names are globally unique
- Someone else using that name
- You deleted account (name reserved 24-48 hours)
Quick fixes:
1. Choose different name:
- Add random suffix:
mystorageacct12345 - Use company/project prefix
2. Check name availability:
az storage account check-name --name mystorageacct
3. Wait if recently deleted:
- Names reserved up to 48 hours after deletion
- Use different name meanwhile
Naming rules:
- 3-24 characters
- Lowercase letters and numbers only
- Globally unique across all Azure
"NetworkSecurityGroupCannotBeAttachedToGatewaySubnet"
What it means: NSG not allowed on gateway subnet.
Causes:
- Trying to attach NSG to subnet containing VPN/ExpressRoute gateway
- Azure restriction for gateway subnets
Quick fixes:
- Don't attach NSG to gateway subnet (by design)
- Use NSG on other subnets
- Use Azure Firewall for gateway subnet security
Note: This is expected behavior, not a bug.
"PublicIPAddressCannotBeDeleted" or resource locked
What it means: Resource can't be deleted while in use.
Causes:
- Resource attached to another resource (e.g., NIC, load balancer)
- Resource locked explicitly
- Resource in use by service
Quick fixes:
1. Check resource dependencies:
- Portal β Resource β Overview β See what it's attached to
- Must detach/delete dependent resources first
2. Check for locks:
# List locks on resource
az lock list --resource-group MyRG
# Delete lock
az lock delete --name MyLock --resource-group MyRG
3. Deletion order (example for VM):
- Stop VM
- Delete VM
- Delete network interface
- Delete public IP
- Delete virtual network
- Delete resource group
"DeploymentFailed" (ARM template / App Service)
What it means: Deployment error.
Causes:
- ARM template syntax error
- Invalid parameter values
- Quota exceeded
- Dependency failure
- App Service configuration issue
Quick fixes:
1. Check deployment logs:
- Portal β Resource group β Deployments β Failed deployment β Error details
2. Validate ARM template:
az deployment group validate \
--resource-group MyRG \
--template-file template.json \
--parameters @parameters.json
3. Check specific error message:
- Drill into error details in portal
- Google exact error code/message
- Check ARM template troubleshooting guide
For App Service:
- Check deployment logs: Portal β App Service β Deployment Center β Logs
- Verify build succeeded
- Check app settings/connection strings
- Review Kudu logs:
https://<app-name>.scm.azurewebsites.net
"Function execution timeout" (Azure Functions)
What it means: Function took too long to execute.
Causes:
- Consumption plan timeout (default 5 minutes)
- Long-running operation
- External API slow
- Cold start delay
Quick fixes:
1. Check timeout setting:
- Portal β Function App β Configuration β Application settings
functionTimeoutsetting (Consumption: max 10 min, Premium/Dedicated: unlimited)
2. Increase timeout (if on Premium/Dedicated plan):
// host.json
{
"functionTimeout": "00:10:00"
}
3. Optimize function:
- Reduce external API calls
- Use async/await properly
- Cache data when possible
- Break into smaller functions
4. Upgrade plan:
- Consumption β Premium (no timeout limit)
- Use Durable Functions for long-running workflows
"Storage account access denied" or "403 Forbidden"
What it means: Don't have permission to access storage.
Causes:
- SAS token expired
- Firewall blocking your IP
- RBAC permissions insufficient
- Public access disabled
Quick fixes:
1. Check firewall rules:
- Portal β Storage account β Networking
- Add your IP to allowed list
- Or enable "Allow access from all networks" (testing only)
2. Verify SAS token:
# Generate new SAS token
az storage account generate-sas \
--account-name mystorageacct \
--services b \
--resource-types co \
--permissions r \
--expiry 2026-12-31
3. Check RBAC:
- Portal β Storage account β Access Control (IAM)
- Verify you have "Storage Blob Data Reader" or similar role
4. Check public access:
- Portal β Storage account β Configuration β Allow Blob public access
- Must be enabled for anonymous access
Quick Fixes: Azure Not Working?
Fix #1: Clear Azure Portal Cache
Why it works: Cached portal data can cause errors.
How to clear:
- Azure Portal β Settings (gear icon) β Sign out all other sessions
- Clear browser cache (Ctrl+Shift+Del / Cmd+Shift+Del)
- Try incognito/private mode
- Hard refresh: Ctrl+Shift+R (Windows) / Cmd+Shift+R (Mac)
Portal-specific cache:
- Portal β Settings β Reset all settings
- Restores portal to defaults
Fix #2: Check Subscription Status and Credits
Subscription issues are common.
How to check:
- Go to Azure Account Center
- Verify subscription status: "Active"
- Check payment method valid
- Check credits remaining (for free trial/MSDN)
Fix payment issues:
- Update credit card
- Pay outstanding invoices
- Remove spending limit (if applicable)
Fix #3: Verify Region and Service Availability
Not all services available in all regions.
Check service availability:
- Azure Products by Region
- Filter by region and service
Example:
- Some VM sizes only in specific regions
- Azure Bastion not in all regions
Solution:
- Deploy to region with service availability
- Or request service expansion (limited cases)
Fix #4: Use Azure CLI/PowerShell as Backup
Portal down? Use command line.
Azure CLI:
# Install Azure CLI
# macOS: brew install azure-cli
# Windows: Download from https://aka.ms/installazurecliwindows
# Login
az login
# Create resource group
az group create --name MyRG --location eastus
# Create VM
az vm create \
--resource-group MyRG \
--name MyVM \
--image UbuntuLTS \
--admin-username azureuser \
--generate-ssh-keys
Azure PowerShell:
# Install Azure PowerShell
Install-Module -Name Az -AllowClobber -Scope CurrentUser
# Login
Connect-AzAccount
# Create resource group
New-AzResourceGroup -Name MyRG -Location "East US"
Pro tip: Learn CLI basicsβportal is convenient, but CLI is faster and scriptable.
Fix #5: Check Resource Locks
Locks prevent accidental deletion/modification.
Check for locks:
# List all locks in subscription
az lock list --output table
# List locks on specific resource group
az lock list --resource-group MyRG --output table
Remove lock (if appropriate):
az lock delete --name MyLock --resource-group MyRG
Lock types:
- ReadOnly: Can view, but can't modify or delete
- CanNotDelete: Can modify, but can't delete
Common scenario:
- Production resources often locked by governance policy
- Contact admin to unlock temporarily
Fix #6: Review Activity Log
Activity log shows what happened.
Check activity log:
- Portal β Resource β Activity log
- Filter by time range and operation
- Look for failed operations
Via CLI:
# Get activity log for resource group
az monitor activity-log list \
--resource-group MyRG \
--max-events 20 \
--output table
What to look for:
- Who made changes (correlation ID)
- What failed (error messages)
- When it happened (timestamp)
Fix #7: Check Service Health and Planned Maintenance
Azure announces planned maintenance.
Check Service Health:
- Portal β Service Health β Planned maintenance
- See upcoming maintenance windows
- Can affect VM availability
RDP/SSH unavailable during maintenance:
- VMs may reboot
- Plan accordingly
- Use availability sets/zones for HA
Fix #8: Restart or Redeploy Resource
Turn it off and on again.
Restart VM:
# Restart (keeps allocation)
az vm restart --name MyVM --resource-group MyRG
# Stop (deallocate) and start (new allocation)
az vm deallocate --name MyVM --resource-group MyRG
az vm start --name MyVM --resource-group MyRG
Restart App Service:
az webapp restart --name MyApp --resource-group MyRG
Restart Function App:
az functionapp restart --name MyFunctionApp --resource-group MyRG
When to restart:
- Unresponsive service
- After configuration change
- Random errors
- Performance degradation
Azure Portal Not Working?
Issue: Portal Loading Forever or "Unexpected error occurred"
Causes:
- Browser cache corrupted
- Browser extension interference
- Network/proxy issue
- Portal outage (rare)
Troubleshoot:
1. Try incognito/private mode:
- Bypasses cache and extensions
- If works, cache/extension is the issue
2. Clear browser cache:
- Chrome: Settings β Privacy β Clear browsing data
- Edge: Settings β Privacy β Choose what to clear
- Firefox: Settings β Privacy β Clear Data
3. Disable browser extensions:
- Ad blockers can interfere
- Try disabling all extensions
4. Try different browser:
- Chrome, Edge, Firefox, Safari
5. Check network:
- Disable VPN
- Try different network
- Check firewall/proxy
6. Use Azure CLI:
- Bypass portal entirely if down
Issue: Can't Find Resource in Portal
Causes:
- Wrong subscription selected
- Resource in different resource group
- Resource deleted
- No permissions to view
Troubleshoot:
1. Search all resources:
- Portal β Search bar (top) β Type resource name
- Shows resources across all subscriptions
2. Check subscription filter:
- Portal β Settings (gear) β Directories + subscriptions
- Verify correct subscriptions selected
3. Check resource group:
- Portal β Resource groups β Browse all
- Look for resource
4. Use Azure CLI:
# Search all subscriptions
az account list --output table
az account set --subscription "My Subscription"
# Find resource
az resource list --name MyResource --output table
Azure Virtual Machines Not Working?
Issue: Can't RDP or SSH to VM
Causes:
- VM not running
- NSG blocking port 3389/22
- Public IP not assigned
- VM agent not running
- Password incorrect
Troubleshoot:
1. Check VM status:
az vm get-instance-view --name MyVM --resource-group MyRG --query instanceView.statuses
2. Start VM if stopped:
az vm start --name MyVM --resource-group MyRG
3. Check NSG rules:
# List NSG rules
az network nsg rule list --nsg-name MyNSG --resource-group MyRG --output table
# Add RDP rule (port 3389)
az network nsg rule create \
--nsg-name MyNSG \
--resource-group MyRG \
--name AllowRDP \
--priority 1000 \
--source-address-prefixes '*' \
--destination-port-ranges 3389 \
--access Allow \
--protocol Tcp
4. Check public IP:
# Get VM public IP
az vm show --name MyVM --resource-group MyRG --show-details --query publicIps -o tsv
5. Reset password:
az vm user update \
--resource-group MyRG \
--name MyVM \
--username azureuser \
--password NewP@ssw0rd123
6. Use Serial Console (emergency access):
- Portal β VM β Support + troubleshooting β Serial console
- Works even if network broken
Issue: VM Running Slow or Unresponsive
Causes:
- High CPU/memory usage
- Disk throttling (I/O limits)
- VM size too small
- Software issue
Troubleshoot:
1. Check metrics:
- Portal β VM β Metrics
- Check CPU, memory, disk IOPS, network
2. Resize VM:
# List available sizes
az vm list-sizes --location eastus --output table
# Resize VM (requires restart)
az vm resize --resource-group MyRG --name MyVM --size Standard_D4s_v3
3. Check disk performance:
- Standard HDD: Low IOPS (500 IOPS)
- Standard SSD: Medium IOPS (500-6000 IOPS)
- Premium SSD: High IOPS (120-20000 IOPS)
4. Upgrade disk:
az disk update --resource-group MyRG --name MyDisk --sku Premium_LRS
Azure App Service Not Working?
Issue: App Service Not Starting or "503 Service Unavailable"
Causes:
- Application error on startup
- Configuration issue
- Insufficient App Service Plan size
- Deployment failed
Troubleshoot:
1. Check application logs:
- Portal β App Service β Log stream
- Or download logs: Monitoring β App Service logs
2. Check Kudu console:
- Navigate to
https://<app-name>.scm.azurewebsites.net - Debug Console β Check logs under LogFiles
3. Verify deployment succeeded:
- Portal β Deployment Center β Logs
- Check for build/deploy errors
4. Check app settings:
- Portal β Configuration β Application settings
- Verify connection strings correct
- Check environment variables
5. Scale up App Service Plan:
- Portal β App Service Plan β Scale up
- Upgrade to higher tier if running out of resources
Issue: Deployment Failing
See "DeploymentFailed" error section above.
Additional checks:
- Verify source control credentials
- Check build logs
- Test locally first
- Review deployment slots (use staging slot)
Azure Functions Not Working?
Issue: Function Not Triggering
Causes:
- Trigger configuration incorrect
- Function disabled
- Binding issue
- Permission issue (e.g., storage account access)
Troubleshoot:
1. Check function status:
- Portal β Function App β Functions β Your function
- Verify "Enabled"
2. Check trigger configuration:
- HTTP trigger: Correct HTTP method? Authorization level?
- Timer trigger: CRON expression correct?
- Queue trigger: Storage account accessible?
3. Test manually:
- Portal β Function β Code + Test β Run
- See immediate error messages
4. Check application logs:
- Portal β Function App β Monitor β Logs
5. Verify storage account connection:
- Function Apps require storage account
- Check connection string valid
Azure Storage Not Working?
Issue: Can't Upload or Download Blobs
See "Storage account access denied" error section above.
Additional checks:
- Check storage account firewall
- Verify SAS token not expired
- Check CORS settings (for browser uploads)
- Verify connection string correct
When Azure Actually Goes Down
What Happens
Recent major outages:
- July 2024: Global Azure outage (DDoS attack on Azure infrastructure) - 10+ hours
- January 2024: Azure AD outage (authentication failures) - 4 hours
- September 2023: West Europe region outage (power issues) - 6 hours
Typical causes:
- Datacenter infrastructure failures (power, cooling, network)
- Azure AD/authentication platform issues
- Regional outages (weather, power grid)
- Software deployment bugs
- DDoS attacks
- Cascading failures
Impact:
- Portal inaccessible
- VMs unreachable
- Services stopped
- Authentication failures
- Data temporarily unavailable (but not lost)
How Microsoft Responds
Communication channels:
- Azure Status
- Service Health in Portal
- @Azure and @AzureSupport on Twitter
- Email to subscription admins (for severe incidents)
- Azure mobile app notifications
Timeline:
- 0-30 min: Users report issues on Twitter/DownDetector
- 30-90 min: Microsoft posts investigating message
- 90-180 min: Regular updates (every 30-60 min)
- Resolution: Usually 2-12 hours for major outages
- Post-incident review (PIR): Posted to Service Health within 2 weeks
What to Do During Outages
1. Implement failover (if multi-region):
- Traffic Manager: Automatic failover
- Manual: Update DNS to secondary region
- Activate DR (disaster recovery) plan
2. Communicate status:
- Update status page
- Email customers proactively
- Tweet/social media updates
3. Monitor status:
- Follow @AzureSupport
- Check Service Health in portal
- Set up API Status Check alerts
4. Document impact:
- Screenshot errors
- Note affected resources
- Track downtime duration
- Use for SLA credit request
5. Don't make changes:
- Wait for resolution
- Don't try to "fix" during outage (may make worse)
- Don't delete/recreate resources
Azure SLA credits:
- VMs: 99.9% (single instance), 99.95% (availability set)
- Storage: 99.9%
- If SLA breached, request credit: Portal β Help + support β Service request
Azure Down Checklist
Follow these steps in order:
Step 1: Verify it's actually Azure
- Check Azure Status
- Check Service Health in Portal
- Check API Status Check
- Search Twitter: "Azure down"
- Check DownDetector: downdetector.com/status/windows-azure
Step 2: Isolate the issue
- Check if specific service or all Azure
- Check if regional or global
- Try Azure Portal in incognito/different browser
- Try Azure CLI (bypass portal)
Step 3: Quick fixes (if Azure is up)
- Clear browser cache and try portal again
- Check subscription status (active? payment method valid?)
- Verify signed in to correct account
- Check quota limits
- Review activity log for failed operations
Step 4: Service-specific troubleshooting
- VMs: Check if running, verify NSG rules, check public IP
- App Service: Check logs, verify deployment succeeded
- Functions: Check trigger config, test manually
- Storage: Check firewall, verify SAS token, check RBAC
Step 5: Advanced troubleshooting
- Check resource locks
- Review Service Health for planned maintenance
- Restart/redeploy resource
- Try different region (if possible)
- Check for underlying service dependencies (e.g., Azure AD for auth)
Step 6: Contact support (if still not working)
- Create support request: Portal β Help + support β New support request
- Include: Subscription ID, resource names, error messages, correlation IDs
- For production outages: Use Severity A (critical)
Prevent Future Issues
1. Implement Multi-Region Architecture
Don't put all eggs in one basket.
Best practices:
- Deploy critical apps to 2+ regions
- Use Azure Traffic Manager for automatic failover
- Replicate storage (GRS/RA-GRS)
- Test failover regularly
Example architectures:
- Active-active: Traffic split between regions
- Active-passive: Failover to secondary only when primary down
2. Set Up Azure Monitor and Alerts
Know about issues before customers do.
Key monitors:
- VM availability and performance
- App Service response times
- Function execution failures
- Storage account throttling
Create alerts:
# Create alert for VM CPU > 80%
az monitor metrics alert create \
--name HighCPU \
--resource-group MyRG \
--scopes /subscriptions/.../resourceGroups/MyRG/providers/Microsoft.Compute/virtualMachines/MyVM \
--condition "avg Percentage CPU > 80" \
--window-size 5m \
--evaluation-frequency 1m \
--action-group MyActionGroup
3. Implement Auto-Scaling
Handle load spikes automatically.
For App Service:
- Portal β App Service Plan β Scale out
- Set rules: CPU > 70% β add instance
- Set max instances (budget control)
For VMs:
- Use Virtual Machine Scale Sets (VMSS)
- Auto-scale based on CPU, memory, or custom metrics
4. Use Azure Backup and Site Recovery
Protect against data loss.
Azure Backup:
- VMs: Automatic backups
- Files: Azure Files backup
- Databases: SQL backup
Azure Site Recovery (ASR):
- VM replication to secondary region
- Automated failover
- RTO: 2-4 hours, RPO: 5 minutes
5. Monitor Service Health and Subscribe to Alerts
Be proactive.
Set up Service Health alerts:
- Portal β Service Health β Health alerts β Add service health alert
- Filter by services you use
- Get notified of incidents affecting your resources
6. Implement Infrastructure as Code (IaC)
Recreate resources quickly.
Tools:
- ARM templates (Azure-native)
- Terraform (multi-cloud)
- Bicep (ARM simplified)
Benefits:
- Version control infrastructure
- Quick disaster recovery (redeploy from code)
- Consistent environments
7. Review and Optimize Costs
Avoid surprise shutdowns due to budget.
Cost management:
- Portal β Cost Management + Billing
- Set budgets and alerts
- Right-size resources (don't over-provision)
- Use reserved instances for predictable workloads
- Stop dev/test resources when not in use
8. Keep Access Credentials Secure and Updated
Avoid lockouts.
Best practices:
- Use Azure Key Vault for secrets
- Rotate service principal credentials regularly
- Use managed identities (no credentials to manage)
- Enable MFA for admin accounts
- Review and remove stale service principals
Key Takeaways
Before assuming Azure is down:
- β Check Azure Status
- β Check Service Health in Portal
- β Check API Status Check
- β Search Twitter for "Azure down"
- β Try Azure CLI (bypass portal)
Common fixes:
- Clear browser cache (portal issues)
- Check subscription status and payment method
- Verify quota limits (common blocker)
- Check regional availability (not all services in all regions)
- Restart or redeploy resource
- Review activity log for specific error details
Configuration issues (NOT Azure down):
- "Subscription disabled" = payment/billing issue
- "Quota exceeded" = hit limits, request increase
- "Allocation failed" = try different region/VM size
- "Authentication failed" = verify credentials, check Azure AD
- "ResourceNotFound" = verify subscription, resource group, region
VM issues:
- Can't RDP/SSH = check NSG rules, verify VM running, check public IP
- VM slow = check metrics, resize VM, upgrade disk tier
- Start failed = allocation issue, try different region
App Service / Functions issues:
- 503 errors = check logs, verify deployment, check app settings
- Deployment failed = review logs, validate configuration
- Function not triggering = check trigger config, test manually
If Azure is actually down:
- Implement failover to secondary region (if multi-region setup)
- Communicate with customers proactively
- Monitor status page for updates
- Document impact for SLA credit request
- Don't make changes during outage
Prevent future issues:
- Implement multi-region architecture
- Set up Azure Monitor and alerts
- Use auto-scaling for resilience
- Enable Azure Backup and Site Recovery
- Subscribe to Service Health alerts
- Use Infrastructure as Code (ARM/Terraform)
- Monitor costs and set budgets
- Use managed identities and Key Vault
Remember: Most "Azure down" issues are configuration errors, quota limits, or subscription problemsβnot actual Azure outages. Check subscription status, quotas, and resource-specific logs before assuming platform outage.
Need real-time Azure status monitoring? Track Azure uptime with API Status Check - Get instant alerts when Azure goes down.
Related Resources
- Is Azure Down Right Now? β Live status check
- Azure Outage History β Past incidents and timeline
- Azure vs AWS Uptime β Which cloud provider is more reliable?
- Multi-Region Azure Architecture Guide β Build resilient cloud infrastructure
Monitor Your APIs
Check the real-time status of 100+ popular APIs used by developers.
View API Status β