Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

BlogDatabase Monitoring Guide

Database Monitoring Guide: Metrics, Tools & Best Practices (2026)

Database outages are expensive — the average cost of database downtime is $5,600 per minute. This guide covers everything you need to monitor database health, catch slow queries early, and prevent outages from reaching your users.

Published: April 2026·18 min read

🚨 The Hidden Database Failure Pattern

Most database outages don't announce themselves. They begin as gradual degradation — connection pool saturation creeping up, replication lag growing slowly, a single slow query getting slower. By the time your application starts throwing errors, you're already minutes or hours into an incident. Monitoring catches this before users do.

Why Database Monitoring Is Different from API Monitoring

Application and API monitoring is relatively straightforward: check if the endpoint responds, measure response time, alert on errors. Database monitoring is more nuanced because:

The Core Database Metrics to Monitor

These seven metrics cover the failure modes that cause the majority of database incidents:

1. Query Response Time (P95/P99)

The most important performance metric. Track query latency at the 95th and 99th percentile — not the average. Database averages lie because a small percentage of extremely slow queries (N+1 problems, missing indexes, table scans) can have an outsized impact without moving the mean much.

PercentileGoodWarningCritical
P50< 5ms5-20ms> 20ms
P95< 50ms50-200ms> 200ms
P99< 200ms200-500ms> 500ms

2. Connection Pool Utilization

Every database has a maximum connection limit. When you hit it, new application requests fail immediately with connection errors. Track:

Alert at 80% pool utilization. Page at 95%. If you see waiting connections, you're already in an incident.

3. Replication Lag

For any primary/replica database setup, replication lag is the time delay between a write on the primary and its appearance on the replica. Applications routing reads to replicas will serve stale data during lag spikes.

Alert thresholds by criticality: <1s (good), 1-10s (warning), >30s (critical for most applications), >5 min (potential data loss window if primary fails).

4. Lock Waits and Deadlocks

Long-running transactions hold row locks, blocking other queries. Deadlocks are circular lock dependencies that database engines resolve by killing one of the transactions. Both degrade application throughput.

-- PostgreSQL: find queries waiting for locks
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY duration DESC;

5. Buffer/Cache Hit Rate

A healthy database serves most reads from memory (buffer cache), not disk. Low cache hit rates mean expensive disk I/O on every query.

Target: >95% cache hit rate for OLTP workloads. Below 90% indicates your working set doesn't fit in memory — consider increasing RAM or caching at the application layer.

6. Disk I/O and Storage

7. Slow Query Rate

Enable the slow query log and track the count of queries exceeding your latency threshold (typically >100ms or >1s). A sudden spike in slow query count is often the first signal of a missing index, lock contention, or a bad query introduced in a new deployment.

📡 Monitor your database uptime every 30 seconds — get alerted in under a minute

Trusted by 100,000+ websites · Free tier available

Start Free →

Database Monitoring by Type

PostgreSQL Monitoring

PostgreSQL ships with powerful built-in statistics views:

Enable pg_stat_statements (it's not on by default) — this single extension unlocks per-query performance data that's essential for finding slow queries.

MySQL / MariaDB Monitoring

Key MySQL monitoring queries:

-- Active connections and queries
SHOW PROCESSLIST;

-- Global status counters
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
-- Cache hit rate = 1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)

-- Replication lag (on replica)
SHOW REPLICA STATUS\G
-- Look for Seconds_Behind_Source

MongoDB Monitoring

MongoDB key metrics differ from relational databases:

Redis Monitoring

Redis monitoring focuses on memory and throughput:

📡
Recommended

Monitor your database with Better Stack

Better Stack integrates with all major databases and cloud providers. Get unified alerting across your databases, APIs, and infrastructure in one dashboard.

Try Better Stack Free →

Best Database Monitoring Tools in 2026

For Managed Databases (RDS, Cloud SQL, Supabase)

For Self-Hosted Databases

For Query-Level Insights

Setting Up Your Database Monitoring Stack

Step 1: Enable Slow Query Logging

This is the single highest-leverage action for database observability. Enable it everywhere, always.

# PostgreSQL (postgresql.conf)
log_min_duration_statement = 100  # Log queries > 100ms
log_statement = 'none'            # Don't log all statements

# MySQL (my.cnf)
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.1             # 100ms threshold

# MongoDB (mongod.conf)
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

Step 2: Set Up Connection Pool Monitoring

If you use PgBouncer, Vitess, or application-level pooling (HikariCP, pg-pool), monitor the pool itself — not just the database:

Step 3: Create a Runbook for Each Alert

Every database alert should have a corresponding runbook. When your on-call engineer gets paged at 3am for "Connection pool at 92%", they should have a document explaining: what this means, what queries to run to diagnose it, and what actions to take (kill connections, scale the database, add replicas). See our runbook guide for templates.

Database Monitoring Checklist

  • Slow query logging enabled with threshold ≤ 100ms
  • pg_stat_statements or performance_schema enabled for query-level metrics
  • Connection pool monitoring — alert at 80%, page at 95%
  • Replication lag monitoring — alert > 10s, page > 60s
  • Disk usage alert at 75%, page at 90%
  • Deadlock rate monitoring — alert on any increase above baseline
  • Buffer cache hit rate alert below 90%
  • Backup verification — confirm backups completed within expected window
  • Runbook for every database alert type
  • Read replica health monitoring separate from primary

Frequently Asked Questions

What is database monitoring?

Database monitoring is the ongoing collection and analysis of metrics from your database system — including query performance, connection usage, replication lag, and resource utilization — to detect and resolve issues before they cause application downtime.

What are the most important database metrics to monitor?

The most critical metrics are: query response time (P95/P99), connection pool saturation, replication lag, slow query count, lock waits, disk I/O, and buffer cache hit rate. Track all seven and you'll catch the vast majority of database problems early.

How do I monitor PostgreSQL performance?

Enable pg_stat_statements and query pg_stat_activity, pg_stat_replication, and pg_stat_user_tables. Use pgBadger for slow query log analysis. Connect an APM tool (DataDog, New Relic) for automated metric collection and alerting.

What is a good alert threshold for database connection pool?

Alert at 80% pool utilization to give you time to respond. Page at 95%. Alert immediately if you see any waiting connections, as that indicates the pool is already exhausted.

What is the best database monitoring tool in 2026?

For managed databases, Better Stack or DataDog provide excellent cross-service visibility. For self-hosted databases, Prometheus + Grafana with database-specific exporters is the most flexible option. For query-level insights, your APM tool (New Relic, DataDog APM) is most valuable.

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your database goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for your database + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Related Monitoring Guides

Monitor Your Database with Better Stack

Get unified alerting across your databases, APIs, and infrastructure. Set up in under 5 minutes.

Try Better Stack Free — No Credit Card Required

Or use APIStatusCheck Alert Pro — monitoring from $9/mo