What is cloud monitoring?

Cloud monitoring is the practice of collecting, analyzing, and alerting on metrics, logs, and traces from cloud infrastructure — including virtual machines, containers, serverless functions, managed databases, load balancers, and cloud-native services. Unlike traditional infrastructure monitoring, cloud monitoring must track ephemeral resources that scale dynamically and span multiple providers and regions.

What metrics should I monitor in AWS?

Key AWS metrics to monitor include: EC2 (CPU utilization, network I/O, disk I/O, status check failures), RDS (CPU, FreeStorageSpace, DatabaseConnections, ReadLatency, WriteLatency), Lambda (Duration, Errors, Throttles, ConcurrentExecutions), ALB/ELB (RequestCount, TargetResponseTime, HTTPCode_ELB_5XX_Count, UnHealthyHostCount), SQS (ApproximateNumberOfMessagesVisible, NumberOfMessagesSent), and overall CloudWatch billing alerts.

What is the difference between native cloud monitoring and third-party tools?

Native cloud monitoring tools (AWS CloudWatch, GCP Cloud Monitoring, Azure Monitor) are tightly integrated with their respective cloud services and provide the deepest metric coverage. However, they work in silos — CloudWatch cannot see your GCP resources. Third-party tools (Datadog, New Relic, Better Stack, Dynatrace) provide a unified view across multiple clouds, often with better alerting, dashboards, and correlation capabilities, but at additional cost.

How do I monitor serverless functions (Lambda, Cloud Functions)?

Serverless monitoring focuses on: invocation count, duration (P50/P95/P99), error rate, cold start rate, throttle count, and concurrent execution limits. Use structured logging (JSON) in your functions so you can query logs effectively. For distributed tracing across Lambda invocations, AWS X-Ray, Datadog APM, or OpenTelemetry with a compatible backend provide end-to-end trace visibility.

What is the best cloud monitoring tool in 2026?

For single-cloud AWS teams, CloudWatch + AWS Managed Grafana provides excellent coverage at low cost. For multi-cloud or teams needing unified observability, Datadog and New Relic are the leading commercial options. For uptime and synthetic monitoring across cloud-hosted services, Better Stack provides global check locations, on-call alerting, and status pages in a single product.

Cloud Monitoring Guide: AWS, GCP & Azure Observability in 2026

What Cloud Monitoring Covers

Cloud monitoring spans several distinct categories:

Category	What It Covers	Key Tools
Infrastructure monitoring	VMs, containers, serverless functions	CloudWatch, Datadog, Zabbix
Application monitoring (APM)	Response times, error rates, traces	Datadog APM, New Relic, Dynatrace
Uptime monitoring	External availability checks	Better Stack, Pingdom, API Status Check
Log monitoring	Log aggregation and analysis	CloudWatch Logs, Better Stack Logs, Splunk
Cost monitoring	Spend anomalies, budget alerts	AWS Cost Explorer, Infracost, CloudHealth

AWS Monitoring: Key Metrics by Service

EC2 Instances

Metric	Warning	Critical
CPUUtilization	> 70% for 10min	> 90% for 5min
StatusCheckFailed	Any value > 0	Sustained > 0
NetworkIn/Out	> 80% instance limit	> 95% instance limit
DiskReadOps/WriteOps	Baseline + 3σ	Baseline + 5σ

RDS / Aurora

Metric	Alert Threshold
FreeStorageSpace	Alert at < 20% remaining; page at < 10%
DatabaseConnections	Alert at 80% of max_connections parameter
ReadLatency / WriteLatency	Alert if P99 exceeds 2x normal baseline
ReplicaLag	Alert > 30s; page > 5min
CPUUtilization	Alert > 80% sustained; investigate query patterns

Lambda / Serverless

Lambda monitoring requires different thinking — there are no servers to monitor, only function invocations:

Error rate: Alert when errors > 1% of invocations in a 5-minute window
Duration P99: Alert if P99 approaches your function timeout (leaves no headroom)
Throttles: Any throttles indicate you've hit concurrency limits — scale up reserved concurrency
Cold starts: High cold start rate (> 10%) degrades user-facing latency — use provisioned concurrency for critical functions

# CloudWatch alarm for Lambda error rate
aws cloudwatch put-metric-alarm \
  --alarm-name "lambda-high-error-rate" \
  --metric-name Errors \
  --namespace AWS/Lambda \
  --statistic Sum \
  --period 300 \
  --threshold 5 \
  --comparison-operator GreaterThanThreshold \
  --dimensions Name=FunctionName,Value=my-function \
  --evaluation-periods 1 \
  --alarm-actions arn:aws:sns:us-east-1:123456789:alerts

📡

Recommended

Unified cloud monitoring with Better Stack

Better Stack monitors your cloud infrastructure across AWS, GCP, and Azure — HTTP, TCP, ping checks from 30+ locations with on-call alerting built in.

Try Better Stack Free →

GCP Cloud Monitoring

Google Cloud Monitoring (formerly Stackdriver) is GCP's native observability platform. Key metrics by service:

GCP Service	Key Metrics
GCE (VM)	cpu/utilization, disk/read_bytes_count, network/received_bytes_count
Cloud SQL	database/cpu/utilization, database/memory/utilization, database/disk/utilization, database/replication/replica_lag
GKE	container/cpu/core_usage_time, container/memory/used_bytes, pod/volume/used_bytes
Cloud Functions	function/execution_count, function/execution_times, function/user_memory_bytes
Cloud Run	run/request_count, run/request_latencies, run/container/cpu/utilization

Azure Monitor

Azure Monitor is Microsoft's unified monitoring platform for Azure infrastructure. Key concepts:

Azure Metrics: Time-series data from all Azure resources, stored 93 days, queryable via Metrics Explorer
Log Analytics (Kusto): Centralized log ingestion and KQL (Kusto Query Language) querying
Application Insights: APM for web applications — request rates, dependency tracking, exceptions, custom events
Azure Alerts: Metric, log search, and activity log alerts with Action Groups for notifications

Multi-Cloud Monitoring Strategy

Most organizations run workloads on multiple clouds. Native tools don't span cloud boundaries — a CloudWatch dashboard can't show your GCP Cloud SQL metrics. For multi-cloud teams, the options are:

Approach	Pros	Cons
Native tools per cloud	Free, deep integration	Siloed — no cross-cloud correlation
Datadog / New Relic	Unified view, powerful alerting	Expensive at scale ($15-25/host/mo)
Prometheus + Grafana	Free, flexible, powerful	Self-hosted operational burden
Better Stack (uptime layer)	Simple, cloud-agnostic external checks	Uptime focus, not deep infra metrics

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your cloud services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your cloud services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Cloud Cost Monitoring: The Hidden Dimension

Cloud monitoring isn't just about availability and performance — it's also about cost. A misconfigured auto-scaling group or forgotten development environment can generate thousands in unexpected cloud spend. Best practices:

Set billing alerts early. Configure AWS Budget Alerts, GCP Budget Alerts, and Azure Cost Alerts as soon as you create an account. Alert at 80% of expected monthly spend.
Tag everything. Resource tagging enables cost attribution by team, service, and environment. Without tags, you can't diagnose where spend spikes come from.
Watch data transfer costs. Cross-region and egress data transfer costs are often the surprise line item. Monitor monthly egress volumes.
Audit reserved capacity. Unused reserved instances waste money. Review utilization monthly and right-size or sell unused reservations.

Cloud Monitoring Guide: AWS, GCP & Azure Observability (2026)

☁️ The Cloud Monitoring Challenge