Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

BlogCloud Monitoring Guide

Cloud Monitoring Guide: AWS, GCP & Azure Observability (2026)

Cloud infrastructure fails differently than on-premises hardware. Resources appear and disappear in seconds, traffic spikes by 10x overnight, and failure modes you never anticipated emerge from managed services you don't control. This guide covers how to build effective cloud monitoring across AWS, GCP, and Azure.

Published: April 2026·18 min read

☁️ The Cloud Monitoring Challenge

Cloud monitoring is harder than traditional monitoring in one key way: you don't control the infrastructure. A managed RDS instance can have an AWS-side issue you can't see. A Lambda cold start can inflate your P99 latency. A NAT gateway can get overwhelmed. Effective cloud monitoring watches your resources AND the cloud services they depend on.

What Cloud Monitoring Covers

Cloud monitoring spans several distinct categories:

CategoryWhat It CoversKey Tools
Infrastructure monitoringVMs, containers, serverless functionsCloudWatch, Datadog, Zabbix
Application monitoring (APM)Response times, error rates, tracesDatadog APM, New Relic, Dynatrace
Uptime monitoringExternal availability checksBetter Stack, Pingdom, API Status Check
Log monitoringLog aggregation and analysisCloudWatch Logs, Better Stack Logs, Splunk
Cost monitoringSpend anomalies, budget alertsAWS Cost Explorer, Infracost, CloudHealth

AWS Monitoring: Key Metrics by Service

EC2 Instances

MetricWarningCritical
CPUUtilization> 70% for 10min> 90% for 5min
StatusCheckFailedAny value > 0Sustained > 0
NetworkIn/Out> 80% instance limit> 95% instance limit
DiskReadOps/WriteOpsBaseline + 3σBaseline + 5σ

RDS / Aurora

MetricAlert Threshold
FreeStorageSpaceAlert at < 20% remaining; page at < 10%
DatabaseConnectionsAlert at 80% of max_connections parameter
ReadLatency / WriteLatencyAlert if P99 exceeds 2x normal baseline
ReplicaLagAlert > 30s; page > 5min
CPUUtilizationAlert > 80% sustained; investigate query patterns

Lambda / Serverless

Lambda monitoring requires different thinking — there are no servers to monitor, only function invocations:

# CloudWatch alarm for Lambda error rate
aws cloudwatch put-metric-alarm \
  --alarm-name "lambda-high-error-rate" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --statistic Sum \
  --period 300 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=FunctionName,Value=my-function \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789:alerts
📡
Recommended

Unified cloud monitoring with Better Stack

Better Stack monitors your cloud infrastructure across AWS, GCP, and Azure — HTTP, TCP, ping checks from 30+ locations with on-call alerting built in.

Try Better Stack Free →

GCP Cloud Monitoring

Google Cloud Monitoring (formerly Stackdriver) is GCP's native observability platform. Key metrics by service:

GCP ServiceKey Metrics
GCE (VM)cpu/utilization, disk/read_bytes_count, network/received_bytes_count
Cloud SQLdatabase/cpu/utilization, database/memory/utilization, database/disk/utilization, database/replication/replica_lag
GKEcontainer/cpu/core_usage_time, container/memory/used_bytes, pod/volume/used_bytes
Cloud Functionsfunction/execution_count, function/execution_times, function/user_memory_bytes
Cloud Runrun/request_count, run/request_latencies, run/container/cpu/utilization

Azure Monitor

Azure Monitor is Microsoft's unified monitoring platform for Azure infrastructure. Key concepts:

Multi-Cloud Monitoring Strategy

Most organizations run workloads on multiple clouds. Native tools don't span cloud boundaries — a CloudWatch dashboard can't show your GCP Cloud SQL metrics. For multi-cloud teams, the options are:

ApproachProsCons
Native tools per cloudFree, deep integrationSiloed — no cross-cloud correlation
Datadog / New RelicUnified view, powerful alertingExpensive at scale ($15-25/host/mo)
Prometheus + GrafanaFree, flexible, powerfulSelf-hosted operational burden
Better Stack (uptime layer)Simple, cloud-agnostic external checksUptime focus, not deep infra metrics

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your cloud services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your cloud services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Cloud Cost Monitoring: The Hidden Dimension

Cloud monitoring isn't just about availability and performance — it's also about cost. A misconfigured auto-scaling group or forgotten development environment can generate thousands in unexpected cloud spend. Best practices:

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you