Blog/Container Monitoring Guide

Container Monitoring Guide: Docker & Kubernetes Observability (2026)

Which metrics to track, which tools to use, and how to alert on container issues before they become outages.

By API Status Check·Updated April 2026·11 min read

Containers fundamentally change how you monitor infrastructure. Traditional server monitoring — CPU, memory, disk on a fixed host — only covers part of the picture. Container workloads are ephemeral, horizontally scaled, and managed by orchestrators like Kubernetes that make scheduling decisions you need visibility into.

This guide covers the complete container monitoring picture: Docker metrics for single-host setups, Kubernetes metrics for orchestrated clusters, alerting thresholds, and the best tools for each layer.

Why Container Monitoring Is Different

Three things make container monitoring distinct from traditional infrastructure monitoring:

Essential Docker Container Metrics

MetricWhat It MeasuresAlert ThresholdPage Threshold
CPU %CPU used vs. allocated limit> 80% sustained 5 min> 95% (throttling)
Memory usageBytes used vs. memory limit> 80% of limit> 90% (OOM risk)
Restart countContainer restarts since creation> 1 restart in 15 min> 5 (CrashLoop)
Net I/OBytes sent/receivedBaseline × 5Baseline × 20
Block I/ODisk read/write bytesDepends on workloadSustained saturation
# Check all running containers at once
docker stats --no-stream

# Output: container metrics snapshot
CONTAINER ID   NAME        CPU %   MEM USAGE / LIMIT     MEM %   NET I/O
a1b2c3d4e5f6   web         12.3%   256MiB / 512MiB      50.0%   1.2GB / 890MB
f1e2d3c4b5a6   worker      0.8%    128MiB / 256MiB      50.0%   45MB / 12MB

# Continuous monitoring with formatting
docker stats --format "table {{.Name}}	{{.CPUPerc}}	{{.MemPerc}}	{{.MemUsage}}	{{.NetIO}}"

# Check restart count for specific container
docker inspect --format='{{.RestartCount}}' web

Kubernetes Monitoring: The Full Picture

Kubernetes adds an orchestration layer above individual containers. You need visibility at four levels:

1. Pod-Level Metrics

MetricPrometheus QueryAlert On
Pod restartskube_pod_container_status_restarts_total> 3 in 15 minutes
Pod CPU usagecontainer_cpu_usage_seconds_total> 80% of request
Memory pressurecontainer_memory_working_set_bytes> 80% of limit
OOMKilledkube_pod_container_status_last_terminated_reasonAny OOMKilled event
Pod phasekube_pod_status_phasePending > 10 min

2. Deployment-Level Metrics

# Prometheus: Alert when deployment has unavailable replicas
alert: DeploymentUnavailableReplicas
expr: kube_deployment_status_replicas_unavailable > 0
for: 5m
labels:
  severity: warning
annotations:
  summary: "Deployment {{ $labels.deployment }} has {{ $value }} unavailable replicas"

# Alert when rollout stalls (requested != ready)
alert: DeploymentRolloutStuck
expr: |
  kube_deployment_status_replicas_updated != kube_deployment_spec_replicas
  AND kube_deployment_spec_replicas > 0
for: 15m
labels:
  severity: critical

3. Node-Level Metrics

# Check node conditions
kubectl describe nodes | grep -A5 "Conditions:"

# Prometheus: Alert on node memory pressure
alert: NodeMemoryPressure
expr: kube_node_status_condition{condition="MemoryPressure",status="true"} == 1
for: 2m
labels:
  severity: critical
annotations:
  summary: "Node {{ $labels.node }} is under memory pressure — pod evictions may start"

4. HPA (Autoscaler) Metrics

If you use Horizontal Pod Autoscaler, monitor both the current and desired replica counts, and alert when HPA is stuck at its maximum replica count — that means demand is outpacing your configured maximum scale.

# HPA at max replicas — can't scale further
alert: HPAMaxedOut
expr: |
  kube_horizontalpodautoscaler_status_current_replicas
  == kube_horizontalpodautoscaler_spec_max_replicas
for: 10m
labels:
  severity: warning
annotations:
  summary: "HPA {{ $labels.horizontalpodautoscaler }} is at max replicas — increase max or investigate traffic spike"
📡
Recommended

Monitor your container endpoints with Better Stack

Better Stack runs synthetic checks on your containerized services from 30+ global locations. HTTP, TCP, and keyword checks — with on-call alerting when containers go down.

Try Better Stack Free →

Container Monitoring Tool Comparison

ToolBest ForDocker SupportK8s SupportPricing
Prometheus + GrafanaOpen-source DIYVia cAdvisorNative (kube-state-metrics)Free (self-hosted)
DatadogEnterprise, multi-cloudAuto-discoveryDeep K8s integration$18/host/mo + infra
New RelicFull-stack observabilityVia agentK8s cluster explorerFree 100GB/mo
Grafana CloudManaged Prometheus/LokiVia scrape configsK8s monitoring bundleFree tier / $29+/mo
DynatraceAuto-discovery, AIOpsOneAgent autoExcellent K8s support$0.08/hour/host
Better StackUptime + endpoint healthHTTP/TCP checksExternal checksFree tier / $20+/mo

Setting Up Prometheus for Kubernetes

The standard open-source Kubernetes monitoring stack uses Prometheus for metrics collection, kube-state-metrics for cluster state, node-exporter for node metrics, and Grafana for dashboards.

# Install kube-prometheus-stack via Helm (includes Prometheus + Grafana + AlertManager)
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update

helm install kube-prometheus-stack \
  prometheus-community/kube-prometheus-stack \
  --namespace monitoring \
  --create-namespace \
  --set grafana.enabled=true \
  --set alertmanager.enabled=true

# Verify installation
kubectl get pods -n monitoring

# Access Grafana (default: admin/prom-operator)
kubectl port-forward -n monitoring svc/kube-prometheus-stack-grafana 3000:80

The kube-prometheus-stack Helm chart includes 40+ pre-built alerting rules and 20+ Grafana dashboards for Kubernetes monitoring out of the box — covering pod health, node resources, API server latency, and etcd metrics.

Container Monitoring Best Practices

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your containerized services goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your containerized services + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

FAQ

What is OOMKilled in Kubernetes?

OOMKilled (Out of Memory Killed) means Kubernetes's Linux OOM killer terminated your container because it exceeded its memory limit. Check kubectl describe pod [name] — the Last State section will show Reason: OOMKilled. Increase your memory limit or investigate memory leaks.

How is container CPU throttling different from high CPU usage?

High CPU % means your container is using a lot of CPU. CPU throttling means the container's CPU usage is being actively capped by cgroups because it hit its CPU limit. Throttled containers respond to requests slowly even though they're not at 100% CPU — they're being held back. Monitor container_cpu_cfs_throttled_periods_total alongside CPU usage.

What's the difference between CrashLoopBackOff and OOMKilled?

OOMKilled is one cause of CrashLoopBackOff. CrashLoopBackOff is Kubernetes's state for “this container keeps crashing and I'm applying exponential backoff before retrying.” The underlying cause could be OOMKilled, a runtime error, a missing environment variable, or a failed liveness probe. Check kubectl logs [pod] --previous to see the last crash output.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.

Free tier · Paid from $24/moStart Free Monitoring
1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.

From $9.99/moFree Privacy Scan
ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.

Free tier · Paid from $5/moTry ElevenLabs Free
SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

We use SEMrush to track how our API status pages rank and catch site health issues early.

From $129.95/moTry SEMrush Free
View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you