Prometheus Alertmanager: Complete Setup & Configuration Guide 2026
Alertmanager is where Prometheus alerts go to either save your night or get lost in noise. This guide covers installation, routing trees, grouping, silences, inhibition, and receiver config for Slack, PagerDuty, and webhooks — everything you need to go from zero to production-grade alerting.
TL;DR
- • Prometheus evaluates alert rules → sends to Alertmanager → Alertmanager routes to receivers
- • Configure
alertmanager.ymlwith route tree, receivers, and inhibition rules - • Group related alerts to avoid notification storms
- • Route critical (SEV1) → PagerDuty, warnings → Slack
- • Use inhibition to suppress child alerts when a parent is firing
How Prometheus + Alertmanager Work Together
The alerting pipeline has two distinct stages that engineers often conflate:
Stage 1: Prometheus (Alert Evaluation)
- • Evaluates PromQL expressions every 15–60s (evaluation_interval)
- • Alert fires when expression is true for the entire
forduration - • Sends firing alerts to Alertmanager via HTTP push
- • Continues sending every
send_resolved: trueinterval until resolved - • Config:
alerting_rules.yml,prometheus.ymlalerting section
Stage 2: Alertmanager (Notification Pipeline)
- • Receives alerts via POST /api/v2/alerts
- • Deduplicates identical alerts (same labels)
- • Groups related alerts into one notification (group_by)
- • Applies silences and inhibition rules
- • Routes to receivers: Slack, PagerDuty, email, webhook
alertmanagers in prometheus.yml with the Alertmanager address.Installing Alertmanager
Alertmanager ships as a single binary. Download from the Prometheus GitHub releases page, or use Docker:
# Binary install (Linux amd64) wget https://github.com/prometheus/alertmanager/releases/download/v0.27.0/alertmanager-0.27.0.linux-amd64.tar.gz tar xvfz alertmanager-0.27.0.linux-amd64.tar.gz cd alertmanager-0.27.0.linux-amd64 ./alertmanager --config.file=alertmanager.yml # Docker docker run -d -p 9093:9093 \ -v /path/to/alertmanager.yml:/etc/alertmanager/alertmanager.yml \ prom/alertmanager # Kubernetes (Helm) helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm install kube-prometheus-stack prometheus-community/kube-prometheus-stack
Alertmanager runs on port 9093 by default. The web UI is available at http://localhost:9093 and shows current alerts, silences, and firing state.
Wiring Prometheus to Alertmanager
In prometheus.yml, add the alerting block and point it to your Alertmanager instance:
# prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093 # or localhost:9093
rule_files:
- "alerting_rules/*.yml" # glob — load all rule files
- "recording_rules/*.yml"
scrape_configs:
# ... your scrape targetsWriting Prometheus Alerting Rules
Alerting rules live in separate .yml files referenced by rule_files. Each rule file has a list of groups, and each group has a list of rules:
# alerting_rules/infrastructure.yml
groups:
- name: infrastructure
interval: 1m # optional override, defaults to evaluation_interval
rules:
- alert: NodeDown
expr: up{job="node"} == 0
for: 1m
labels:
severity: critical
team: platform
annotations:
summary: "Node {{ $labels.instance }} is down"
description: "Node {{ $labels.instance }} has been unreachable for more than 1 minute."
runbook_url: "https://runbooks.example.com/node-down"
- alert: HighCPU
expr: avg by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100 > 85
for: 10m
labels:
severity: warning
team: platform
annotations:
summary: "High CPU on {{ $labels.instance }}"
description: "CPU usage is {{ $value | humanize }}% on {{ $labels.instance }}."Common Alerting Rule Examples
High CPU Usage
warningavg by(instance) (rate(node_cpu_seconds_total{mode!="idle"}[5m])) * 100 > 85Node Down
criticalup{job="node"} == 0High Memory Usage
warning(node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes) * 100 < 10High Error Rate (HTTP)
criticalsum by(service) (rate(http_requests_total{status=~"5.."}[5m])) / sum by(service) (rate(http_requests_total[5m])) * 100 > 5Disk Space Low
warning(node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"}) * 100 < 15📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
Alertmanager Configuration Reference
The alertmanager.yml file has four top-level sections:
global:Default settings — SMTP host, Slack API URL, PagerDuty URL, resolve_timeout
route:The routing tree — defines which receiver handles which alerts based on label matchers
receivers:Notification integrations — Slack, PagerDuty, email, OpsGenie, webhook
inhibit_rules:Silence lower-priority alerts when a higher-priority alert is already firing
Complete alertmanager.yml Example
# alertmanager.yml
global:
resolve_timeout: 5m
slack_api_url: 'https://hooks.slack.com/services/YOUR/SLACK/WEBHOOK'
route:
# Default receiver for unmatched alerts
receiver: slack-warnings
group_by: ['alertname', 'cluster', 'service']
group_wait: 30s # Wait before sending first notification
group_interval: 5m # Wait before sending updated notification for same group
repeat_interval: 12h # Re-notify if alert is still firing
routes:
# Critical alerts → PagerDuty (immediate)
- match:
severity: critical
receiver: pagerduty-critical
group_wait: 10s
repeat_interval: 1h
continue: false # Stop routing after match
# Platform team alerts → dedicated Slack channel
- match:
team: platform
receiver: slack-platform
continue: true # Also check further routes
# Silence noisy storage alerts during off-hours (optional)
- match_re:
alertname: ^(StorageNearCapacity|DiskLatencyHigh)$
receiver: slack-warnings
active_time_intervals:
- business_hours
receivers:
- name: slack-warnings
slack_configs:
- channel: '#alerts-warnings'
title: '{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}'
text: '{{ range .Alerts }}{{ .Annotations.description }}\n{{ end }}'
send_resolved: true
- name: slack-platform
slack_configs:
- channel: '#alerts-platform'
title: '[{{ .Status | toUpper }}] {{ .GroupLabels.alertname }}'
text: '{{ range .Alerts }}*{{ .Labels.instance }}*: {{ .Annotations.description }}\n{{ end }}'
send_resolved: true
- name: pagerduty-critical
pagerduty_configs:
- routing_key: 'YOUR_PAGERDUTY_ROUTING_KEY'
description: '{{ .CommonAnnotations.summary }}'
details:
firing: '{{ template "pagerduty.default.instances" .Alerts.Firing }}'
send_resolved: true
inhibit_rules:
# If a node is down, suppress all other alerts from that node
- source_match:
alertname: NodeDown
severity: critical
target_match_re:
severity: ^(warning|info)$
equal: ['instance']
# If cluster is down, suppress all services in that cluster
- source_match:
alertname: ClusterDown
target_match:
team: platform
equal: ['cluster']
time_intervals:
- name: business_hours
time_intervals:
- weekdays: ['monday:friday']
times:
- start_time: '09:00'
end_time: '18:00'Understanding the Routing Tree
The route tree is evaluated top-down. Each incoming alert starts at the root route and traverses child routes until a match is found (or no match → uses root receiver).
group_byLabels used to group alerts into a single notification. alerts with the same group_by labels are batched. Use ["alertname", "cluster"] to avoid 100 separate notifications for the same outage.
group_waitTime to wait before sending the first notification for a new group. Allows related alerts to arrive before notifying (30s default). Lower for critical, higher for warnings.
group_intervalMinimum time between notifications for the same group after the first one. Prevents notification spam for flapping alerts (5m default).
repeat_intervalHow often to re-send if an alert is still firing. Use 1h for critical (maintain urgency), 12h for warnings (avoid fatigue).
continueIf true, routing continues to the next sibling route after a match. If false (default), routing stops on first match.
Receiver Types & When to Use Each
| Receiver | Best For | Setup |
|---|---|---|
| Slack | Dev teams — low-severity alerts, chat-first culture | Incoming Webhook URL from slack.com/apps |
| PagerDuty | On-call escalation — SEV1/SEV2 critical alerts | Integration key from PagerDuty service settings |
| Low-urgency notifications, audit trails | SMTP host, auth credentials, from/to addresses | |
| OpsGenie | On-call with advanced scheduling, mobile push | API key from OpsGenie integration |
| Webhook | Custom integrations, automation, ChatOps bots | Any HTTP endpoint accepting POST JSON |
| VictorOps / Splunk On-Call | Enterprise teams already using VictorOps | Routing key from Splunk On-Call service |
Slack Receiver: Full Config
receivers:
- name: slack-oncall
slack_configs:
- api_url: 'https://hooks.slack.com/services/T.../B.../...'
channel: '#oncall-alerts'
username: 'Alertmanager'
icon_emoji: ':fire:'
title: |-
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}: {{ .Alerts.Firing | len }}{{ end }}]
{{ .CommonLabels.alertname }}
title_link: 'https://grafana.example.com/d/...'
text: |-
{{ range .Alerts }}
*Alert:* {{ .Annotations.summary }}
*Description:* {{ .Annotations.description }}
*Instance:* {{ .Labels.instance }}
*Runbook:* {{ .Annotations.runbook_url }}
{{ end }}
color: '{{ if eq .Status "firing" }}danger{{ else }}good{{ end }}'
send_resolved: trueamtool CLI: Essential Commands
amtool ships with Alertmanager and lets you manage silences, check routing, and query the API from the command line:
# Set Alertmanager URL (or use ALERTMANAGER_URL env var) export ALERTMANAGER_URL=http://localhost:9093 # List all firing alerts amtool alert # List alerts filtered by labels amtool alert --alertname=HighCPU --severity=critical # Create a silence (maintenance window) amtool silence add --duration=2h \ --comment="Planned maintenance 2026-05-03" \ alertname="HighCPU" instance="prod-server-01" # List active silences amtool silence query # Expire a silence immediately amtool silence expire <silence-id> # Validate config file (critical before deploy!) amtool check-config alertmanager.yml # Test routing: which receiver handles this alert? amtool config routes test \ --config.file=alertmanager.yml \ alertname=NodeDown severity=critical team=platform # Trigger Alertmanager to reload config (hot reload) curl -X POST http://localhost:9093/-/reload
Avoiding Alert Fatigue: Best Practices
✅ Alert on symptoms, not causes
Alert on "users are experiencing errors" (high 5xx rate) not "CPU is 70%". Causes are for dashboards.
✅ Set meaningful for durations
A 1-minute for prevents false positives from transient spikes. Use shorter for critical (1-2m), longer for warnings (10-15m).
✅ Every alert needs a runbook
Add runbook_url to every alert annotation. An alert without a runbook is an alert without a fix path.
✅ Group aggressively
Use group_by: [alertname, cluster] so 50 pods failing fires ONE notification, not 50.
✅ Use inhibition for parent/child relationships
If a node is down, inhibit disk/CPU/memory alerts from that node. One root cause = one page.
✅ Review silence usage monthly
If an alert is silenced more than it fires, fix the alert rule. Permanent silences are tech debt.
High Availability Alertmanager
Alertmanager supports native clustering with the gossip protocol (using --cluster.peer). A cluster of 3 nodes ensures alerting continues if one node fails, with automatic deduplication across instances:
# Node 1
./alertmanager \
--config.file=alertmanager.yml \
--cluster.listen-address=0.0.0.0:9094 \
--cluster.peer=alertmanager-2:9094 \
--cluster.peer=alertmanager-3:9094
# Node 2
./alertmanager \
--config.file=alertmanager.yml \
--cluster.listen-address=0.0.0.0:9094 \
--cluster.peer=alertmanager-1:9094 \
--cluster.peer=alertmanager-3:9094
# In prometheus.yml, list ALL Alertmanager nodes
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager-1:9093
- alertmanager-2:9093
- alertmanager-3:9093Prometheus sends alerts to ALL Alertmanager instances. The cluster handles deduplication via gossip — you'll receive each notification exactly once even with 3 Alertmanager nodes.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your monitoring system goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your monitoring system + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Alertmanager Limitations & When to Supplement It
No built-in on-call scheduling
Alertmanager routes to receivers but has no rotation schedules. Use PagerDuty or OpsGenie for on-call management, not raw Alertmanager webhooks.
Config file changes require reload
While reload is hot (SIGHUP or /-/reload), it still requires a manual step. Tools like Grafana Alerting or Better Stack have UI-based rule management.
No built-in incident grouping/timeline
Alertmanager shows firing alerts but doesn't build incident timelines. For post-mortems and incident tracking, you need a dedicated incident management tool.
Alert history not persisted by default
Alertmanager stores state in memory. After restart, pending notifications and silences may be lost unless using persistent storage.
Frequently Asked Questions
What is Prometheus Alertmanager?
Alertmanager is the dedicated alerting component of the Prometheus ecosystem. Prometheus evaluates alerting rules and sends firing alerts to Alertmanager, which then handles deduplication, grouping, silencing, inhibition, and routing to receivers like Slack, PagerDuty, or email. It runs as a separate binary (alertmanager) and is typically deployed alongside Prometheus.
What is the difference between Prometheus alerting rules and Alertmanager?
Prometheus alerting rules (in .rules.yml files) define WHEN an alert fires — using PromQL expressions and a for duration. Alertmanager handles WHAT HAPPENS after an alert fires — routing it to the right team, grouping related alerts into one notification, silencing during maintenance, and deduplicating repeated firings. Both are required for a functional alerting pipeline.
How do I silence alerts in Alertmanager?
Silences can be created via the Alertmanager web UI (default port 9093), the amtool CLI (amtool silence add alertname=HighCPU --duration=2h --comment="Planned maintenance"), or the Alertmanager HTTP API (POST /api/v2/silences). Silences match alerts by label matchers and expire after the specified duration. They do not affect alert evaluation in Prometheus — only notification delivery.
What is alert inhibition in Alertmanager?
Inhibition rules suppress lower-priority alerts when a higher-priority alert is already firing. Example: if a NodeDown alert fires, inhibit all alerts from that node so you get one notification instead of dozens. Configured under inhibit_rules in alertmanager.yml. Source labels define the inhibiting alert, target labels define which alerts to suppress, and equal labels must match on both.
How do I test Alertmanager configuration without restarting?
Use amtool check-config alertmanager.yml to validate syntax. To reload without restart, send a SIGHUP (kill -HUP <pid>) or POST to /-/reload endpoint. To test routing, use amtool config routes test --config.file=alertmanager.yml alertname=HighCPU severity=critical to see which receiver would handle that alert without firing anything.
Related Guides
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you