What is the best tool for Azure monitoring?

Azure Monitor is the default monitoring platform built into Azure and covers most use cases. Application Insights provides deep APM for web apps and APIs. For production workloads, teams often add Datadog, New Relic, or Grafana Cloud for better cross-platform visibility and advanced alerting. Better Stack is excellent for uptime monitoring and on-call management alongside Azure Monitor.

How do I monitor Azure Functions?

Monitor Azure Functions using Application Insights (automatically enabled on creation) for request traces, exceptions, dependencies, and performance metrics. Key metrics: execution count, failures, duration, and throttled runs. Set alerts on failure rate and P99 duration in Azure Monitor. Use the Live Metrics stream in Application Insights for real-time debugging.

What metrics should I monitor in Azure?

Critical Azure metrics to monitor: VMs (Percentage CPU, Network In/Out, Disk Read/Write Bytes), AKS (node CPU/memory, pod restart count, pending pod count), Azure SQL (DTU/vCore percentage, deadlocks, connection failures, storage used), App Service (CPU percentage, memory, HTTP 5xx errors, response time), and Azure Load Balancer (health probe status, SNAT port exhaustion).

Is Azure Monitor free?

Azure Monitor has a partial free tier. Basic platform metrics are free for all Azure resources. Log Analytics ingestion costs $2.76/GB/month (first 5GB/day free per workspace). Application Insights is priced per GB of telemetry ingested ($2.30/GB after 5GB/month free). Alert rules are charged at $0.10/rule/month for standard alerts. A typical mid-size production environment costs $100-500/month for Azure Monitor.

What is the difference between Azure Monitor and Application Insights?

Azure Monitor is the umbrella platform that collects and analyzes telemetry from all Azure services — it includes metrics, logs, alerts, and more. Application Insights is a feature of Azure Monitor specifically designed for application performance monitoring (APM) — it instruments your code to capture requests, exceptions, dependencies, page views, and custom events. Think of Application Insights as the developer-facing layer and Azure Monitor as the infrastructure-facing layer.

Azure Monitoring Guide 2026: Azure Monitor, App Insights & Best Practices

Azure is the second-largest cloud provider, powering millions of production workloads. Microsoft gives you a rich monitoring stack — Azure Monitor, Application Insights, Log Analytics, Azure Security Center, and more — but knowing which tools to use for which problems, and how to avoid spending a fortune on telemetry ingestion, requires real production experience. This guide distills Azure monitoring into actionable best practices.

The Azure Monitoring Stack: What's Available

Azure provides several native monitoring services, each covering a different layer of observability:

Service	What It Does	Best For
Azure Monitor Metrics	Collect and graph metrics from all Azure services	CPU, memory, latency, error rates
Log Analytics Workspace	Centralize and query logs via KQL	Log aggregation, complex analysis
Azure Monitor Alerts	Alert on metrics, logs, or activity events	Proactive alerting, auto-scaling
Application Insights	Full APM for web apps and APIs	Request tracing, exceptions, dependency tracking
Azure Activity Log	Audit log of all Azure control-plane operations	Security, compliance, change tracking
Azure Service Health	Azure service health events affecting your subscription	Outage awareness, incident tracking
Network Watcher	Network topology, traffic analytics, packet capture	Network debugging, security
Azure Security Center	Security posture and threat detection	Compliance, vulnerability assessment

Azure Monitor: Core Infrastructure Monitoring

Understanding Azure Monitor Metrics

Azure Monitor Metrics is the foundation of Azure infrastructure monitoring. Every Azure resource — VMs, AKS clusters, Functions, SQL databases, App Services, and hundreds more — automatically publishes metrics. Metrics are organized by resource type and metric namespace.

Key Azure Monitor Metrics concepts:

Platform metrics: Free metrics published automatically by all Azure services (no configuration needed)
Custom metrics: Publish application metrics via the Azure Monitor REST API, SDK, or OpenTelemetry
Metric dimensions: Filter and split metrics by dimensions (e.g., HTTP status code, operation name)
Metric retention: 93 days for most metrics; high-resolution (sub-minute) retained 3 days
Diagnostic settings: Required to route metrics and logs to Log Analytics or a storage account

Critical Metrics by Azure Service

Virtual Machines (VMs)

Percentage CPU: Alert at >80% sustained for 5+ minutes
Network In/Out: Baseline and alert on anomalies
Disk Read/Write Bytes: I/O saturation on managed disks
OS Disk IOPS Consumed Percentage: Alert at >90%
Memory (via Azure Monitor Agent): Not available by default — install AMA extension

Azure Kubernetes Service (AKS)

Node CPU usage percentage: Alert at >80% — scale node pool before saturation
Node memory working set percentage: Alert at >80%
Pod count by phase: Watch Pending/Failed counts — indicates scheduling issues
Unschedulable pod count: Immediate alert — pods waiting for capacity
Restarting container count: CrashLoopBackOff detection

Azure SQL Database

DTU percentage (DTU model): Alert at >80% — scale tier before saturation
CPU percentage (vCore model): Alert at >70%
Storage percentage: Alert when >80% to prevent autogrow issues
Deadlocks: Alert on any deadlocks — indicates application logic issues
Connection failures: Alert on sustained failures — connection pool exhaustion

Azure App Service

CPU percentage: Alert at >80% — scale out instances
Memory percentage: Alert at >85% to prevent OOM kills
Http5xx: Error rate — alert on >1% of requests
Response time: Alert on P95 > your SLA threshold
Requests: Traffic baseline — drops indicate upstream issues

Azure Functions

Execution count: Traffic baseline and anomaly detection
Function errors: Alert on sustained error rate >1%
Function execution duration: Watch P99 — timeout is configurable but default 5 min
Throttled function runs: Hitting concurrency limits on Consumption plan
Active instance count: Scale monitoring for Premium/Dedicated plans

📡

Recommended

Supplement Azure Monitor with Better Stack for on-call alerting

Azure Monitor can alert via Action Groups, but Better Stack gives you escalation policies, on-call schedules, and incident management in one place. Integrates with Azure Monitor in minutes.

Try Better Stack Free →

Azure Monitor Alerts: Getting Effective Alerts

Azure Monitor Alerts fire when a condition is met on a metric, log query, or activity log event. Azure supports three alert types — metric alerts (fastest, near real-time), log search alerts (flexible KQL queries on Log Analytics data), and activity log alerts (service health, resource changes).

Alert Best Practices

Use dynamic thresholds: Azure Monitor can learn a metric's normal pattern and alert on deviations — better than static thresholds for seasonal workloads
Set evaluation frequency thoughtfully: Default is every 1 minute; reduce to 5 minutes for non-critical alerts to cut noise
Use Action Groups: Route alerts to email, SMS, webhook, Logic App, or ITSM integrations via Action Groups — define once, reuse across alerts
Alert on symptom, not cause: Alert on HTTP 5xx rate (symptom) rather than CPU (cause) — high CPU doesn't always mean user impact
Severity levels: Use Sev 0 (Critical) for production outages, Sev 1 for degraded performance, Sev 2-4 for warnings and informational
Enable service health alerts: Free alerts when Azure itself has issues in your region — these should be Sev 1

Application Insights: APM for Azure Apps

Application Insights is Microsoft's application performance monitoring (APM) service. It instruments your code — web apps, APIs, Functions, and workers — to capture telemetry automatically. Unlike Azure Monitor Metrics (infrastructure layer), Application Insights operates at the application layer: requests, exceptions, dependencies, page views, custom events, and traces.

Auto-Instrumented vs SDK Instrumentation

Application Insights offers two instrumentation approaches:

Auto-Instrumentation (Codeless)

Available for Azure App Service (.NET, Java, Node.js, Python), Azure Functions, and AKS via the Application Insights Kubernetes agent. No code changes required — enable via the Azure Portal.

Captures: HTTP requests/responses, SQL queries, HTTP dependencies, exceptions, performance counters

SDK Instrumentation (Code-Based)

Add the Application Insights SDK to your code for full control. Available for .NET, Java, Node.js, Python, JavaScript, and more.

Adds: Custom events, custom metrics, custom traces, user journey tracking, custom dimensions on all telemetry

Key Application Insights Features

Application Map: Visual graph of your app's component dependencies — instantly see where failures propagate
Live Metrics Stream: Real-time telemetry with sub-second latency — invaluable during incident response
Smart Detection: AI-powered anomaly detection — alerts on unusual failure rates, response time degradation, and dependency failures automatically
Availability Tests: Run URL ping tests or multi-step web tests from global locations every 1-5 minutes
Distributed Tracing: End-to-end request correlation across microservices using correlation IDs
Profiler: CPU profiling in production — capture call stacks to find hot paths
Snapshot Debugger: Capture exception snapshots with local variables in production without stopping the app

Log Analytics: Querying Logs with KQL

Log Analytics Workspace is where Azure routes all diagnostic logs — VM syslog, AKS container logs, App Service application logs, Activity Logs, and Application Insights telemetry. You query it using Kusto Query Language (KQL).

Essential KQL Queries

// Top failing requests in App Insights

requests
| where success == false
| summarize count() by name, resultCode
| order by count_ desc
| take 20

// P95 response time by operation in last 1 hour

requests
| where timestamp > ago(1h)
| summarize percentile(duration, 95) by name
| order by percentile_duration_95 desc

// AKS pod restart count (last 24h)

KubePodInventory
| where TimeGenerated > ago(24h)
| where ContainerRestartCount > 0
| summarize sum(ContainerRestartCount) by PodName, Namespace
| order by sum_ContainerRestartCount desc

// Exception volume by type (last 6h)

exceptions
| where timestamp > ago(6h)
| summarize count() by type, outerMessage
| order by count_ desc

Log Ingestion Cost Optimization

Log Analytics can become expensive quickly. Strategies to control costs:

Use Commitment Tiers: If ingesting >100GB/day, Commitment Tiers offer 15-65% savings over pay-as-you-go
Set daily cap: Configure a daily ingestion cap per workspace to prevent runaway costs from verbose log sources
Filter at the source: Use diagnostic setting filters to exclude noisy log categories (e.g., Azure Firewall flow logs can be massive)
Separate workspaces by tier: High-retention compliance data in one workspace, short-retention debug data in another
Use Basic Logs: New tier at $0.65/GB (vs. $2.76/GB Analytics) for verbose logs you rarely query
Application Insights sampling: Enable adaptive sampling in App Insights SDK to reduce telemetry volume by 50-90%

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time Azure goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for Azure + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Azure Workbooks & Dashboards

Azure Monitor Workbooks are interactive reports that combine metrics, logs, parameters, and visualizations. They're the preferred way to build custom Azure dashboards beyond the basic Azure Portal charts.

Azure Dashboard: Pin charts from any Azure service — quick at-a-glance operational view
Workbooks: Full interactive reports with drill-downs, parameters (time range, subscription, resource), and mixed data sources
Insights: Pre-built Workbooks for specific services (VM Insights, Container Insights, Network Insights, SQL Insights)
Grafana with Azure Data Source: Use managed Grafana (Azure Managed Grafana) or self-hosted Grafana with the Azure Monitor plugin for Grafana-style dashboards

Third-Party Azure Monitoring Tools

Native Azure Monitor is powerful but has limitations: KQL has a steep learning curve, the alert UI is complex, and cross-cloud visibility is limited. Third-party tools are common in Azure production environments:

Tool	Azure Integration	Best For
Datadog	Native Azure integration — 800+ metrics, log forwarding	Full-stack observability, multi-cloud
New Relic	Azure polling + Log Forwarding integration	APM + infrastructure in one platform
Grafana Cloud	Azure Monitor data source plugin	Teams already using Grafana/Prometheus
Better Stack	Webhook integration with Azure Monitor Alerts	Uptime monitoring + on-call management
Dynatrace	Auto-discovery via Azure API	Enterprise APM with AI-powered root cause
Elastic	Azure Metricbeat + Filebeat + Functionbeat	Log search + APM on self-managed stack

Azure Monitoring Best Practices

1. Enable Diagnostic Settings on Every Resource

By default, only platform metrics are collected. Enable diagnostic settings on each resource to route logs and additional metrics to Log Analytics. Use Azure Policy to enforce this at scale across subscriptions.

2. Use Tags for Cost Allocation and Alert Routing

Tag resources with environment, team, and service. Use tags in Action Groups to route alerts to the right team and in Cost Management to allocate monitoring spend.

3. Set Up Azure Service Health Alerts

Configure Service Health alerts (free) to notify your team when Azure has incidents, planned maintenance, or health advisories in your regions. Integrate these with your on-call tool via webhook or Logic App.

4. Define SLOs and Use Error Budgets

Define SLOs in Log Analytics Workbooks (e.g., 99.9% uptime = 43 minutes downtime/month budget). Track error budget burn rate and alert when burning too fast. See our SLO guide for implementation details.

5. Implement Distributed Tracing for Microservices

If you run multiple services on Azure (AKS + Functions + App Service + Service Bus), enable Application Insights distributed tracing or OpenTelemetry to correlate requests across service boundaries. Without it, debugging multi-hop failures is guesswork.

6. Monitor Your Monitoring: Alert on Data Gaps

Create a "Heartbeat absent" alert in Log Analytics to detect when an agent stops reporting. If your VM's Log Analytics agent crashes, you want to know before a real incident happens without visibility.

📡

Recommended

Better Stack + Azure Monitor: on-call alerting done right

Connect Azure Monitor alerts to Better Stack for smart escalations, on-call scheduling, and incident management. Works with any Azure alert action group.

Try Better Stack Free →

Azure vs AWS Monitoring: Key Differences

If you're coming from AWS, here's how Azure's monitoring stack maps:

AWS Service	Azure Equivalent	Key Difference
CloudWatch Metrics	Azure Monitor Metrics	Azure uses diagnostic settings instead of automatic log routing
CloudWatch Logs	Log Analytics Workspace	Azure uses KQL; AWS uses CloudWatch Logs Insights (SQL-like)
CloudWatch Alarms	Azure Monitor Alerts	Azure has dynamic thresholds; AWS requires manual static thresholds
AWS X-Ray	Application Insights (distributed tracing)	App Insights also does full APM; X-Ray is tracing only
CloudTrail	Azure Activity Log	Very similar; both audit all API calls
AWS Health	Azure Service Health	Both offer subscription-level health events
CloudWatch Dashboards	Azure Workbooks / Dashboards	Workbooks are more powerful; Azure Portal Dashboards are simpler

Getting Started: Azure Monitoring Quickstart Checklist

Day 1 Setup

✓Create a central Log Analytics Workspace in each region
✓Enable diagnostic settings on all critical resources (route to Log Analytics)
✓Enable Application Insights on all web apps and APIs
✓Configure Azure Service Health alerts (free, high value)
✓Set Action Groups for email/SMS/webhook routing
✓Create metric alerts for CPU, memory, error rates on production resources
✓Enable Container Insights on AKS clusters
✓Set a daily ingestion cap on Log Analytics to avoid bill shock

Related Azure Monitoring Resources

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you