Database Monitoring Guide: Tools, Metrics & Best Practices (2026)

Q: What is database monitoring?

Database monitoring is the ongoing process of collecting, analyzing, and alerting on metrics from your database system — including query performance, connection pool usage, replication lag, disk I/O, and resource utilization. The goal is to detect and resolve problems before they cause application downtime or data loss.

Q: What are the most important database metrics to monitor?

The most critical database metrics are: (1) Query response time — P95/P99 latency across all queries; (2) Connection pool saturation — percentage of connections in use; (3) Replication lag — for primary/replica setups, how far behind replicas are; (4) Slow query count — queries exceeding your latency threshold; (5) Lock waits — queries blocked waiting for row/table locks; (6) Disk I/O — read/write throughput and queue depth; (7) Buffer cache hit rate — should be above 95% for most workloads.

Q: How do I monitor PostgreSQL performance?

Monitor PostgreSQL using the built-in pg_stat_activity (active connections), pg_stat_statements (query performance), pg_stat_replication (replication lag), and pg_stat_user_tables (table-level I/O). Use tools like pgBadger for slow query log analysis, or connect an APM tool (DataDog, New Relic) to your PostgreSQL instance for automated metric collection.

Q: What is a good alert threshold for database connection pool?

Alert at 80% connection pool utilization — this gives you time to respond before hitting the limit. Page immediately at 95% utilization, as this is seconds away from connection errors hitting production. Also alert if the connection pool fluctuates rapidly (more than 50% variance in 1 minute), which indicates connection leaks or traffic spikes.

Q: What is the best database monitoring tool in 2026?

The best database monitoring tool depends on your stack. For managed databases (RDS, Cloud SQL, Supabase), use your cloud provider's built-in metrics plus a third-party tool like DataDog or Better Stack for cross-service alerting. For self-hosted databases, Prometheus + Grafana with database-specific exporters (postgres_exporter, mysqld_exporter) is the most flexible open-source option. Better Stack provides unified uptime + database alerting in a single dashboard.

Why Database Monitoring Is Different from API Monitoring

Application and API monitoring is relatively straightforward: check if the endpoint responds, measure response time, alert on errors. Database monitoring is more nuanced because:

Databases are stateful. A slow query doesn't just affect that one request — it can hold locks that cascade to block hundreds of other queries.
Resource contention is invisible from the outside. Your API returns 200, but behind it a database query is taking 8 seconds instead of 8 milliseconds.
Degradation is gradual. Databases don't typically go from healthy to down instantly. They degrade — slowly — until they tip over.
Replication adds complexity. Primary/replica setups create a new failure mode: replication lag, where reads return stale data without any error.

The Core Database Metrics to Monitor

These seven metrics cover the failure modes that cause the majority of database incidents:

1. Query Response Time (P95/P99)

The most important performance metric. Track query latency at the 95th and 99th percentile — not the average. Database averages lie because a small percentage of extremely slow queries (N+1 problems, missing indexes, table scans) can have an outsized impact without moving the mean much.

Percentile	Good	Warning	Critical
P50	< 5ms	5-20ms	> 20ms
P95	< 50ms	50-200ms	> 200ms
P99	< 200ms	200-500ms	> 500ms

2. Connection Pool Utilization

Every database has a maximum connection limit. When you hit it, new application requests fail immediately with connection errors. Track:

Active connections — queries currently executing
Idle connections — connections in the pool, waiting
Waiting connections — requests waiting for a connection (bad sign)
Max connections — your configured limit

Alert at 80% pool utilization. Page at 95%. If you see waiting connections, you're already in an incident.

3. Replication Lag

For any primary/replica database setup, replication lag is the time delay between a write on the primary and its appearance on the replica. Applications routing reads to replicas will serve stale data during lag spikes.

Alert thresholds by criticality: <1s (good), 1-10s (warning), >30s (critical for most applications), >5 min (potential data loss window if primary fails).

4. Lock Waits and Deadlocks

Long-running transactions hold row locks, blocking other queries. Deadlocks are circular lock dependencies that database engines resolve by killing one of the transactions. Both degrade application throughput.

-- PostgreSQL: find queries waiting for locks
SELECT
  pid,
  now() - pg_stat_activity.query_start AS duration,
  query,
  state,
  wait_event_type,
  wait_event
FROM pg_stat_activity
WHERE wait_event_type = 'Lock'
ORDER BY duration DESC;

5. Buffer/Cache Hit Rate

A healthy database serves most reads from memory (buffer cache), not disk. Low cache hit rates mean expensive disk I/O on every query.

Target: >95% cache hit rate for OLTP workloads. Below 90% indicates your working set doesn't fit in memory — consider increasing RAM or caching at the application layer.

6. Disk I/O and Storage

Disk throughput (MB/s) — alert when approaching disk bandwidth limits
IOPS utilization — especially important on provisioned IOPS storage (AWS RDS gp3, io1)
Disk usage percentage — alert at 75%, page at 90% (full disk = database crash)
Write-ahead log (WAL) growth — PostgreSQL WAL accumulating faster than it can replay is a replication warning sign

7. Slow Query Rate

Enable the slow query log and track the count of queries exceeding your latency threshold (typically >100ms or >1s). A sudden spike in slow query count is often the first signal of a missing index, lock contention, or a bad query introduced in a new deployment.

📡 Monitor your database uptime every 30 seconds — get alerted in under a minute

Trusted by 100,000+ websites · Free tier available

Start Free →

Database Monitoring by Type

PostgreSQL Monitoring

PostgreSQL ships with powerful built-in statistics views:

pg_stat_activity — active connections, running queries, wait events
pg_stat_statements — query-level performance statistics (requires extension)
pg_stat_replication — replication lag per replica
pg_stat_bgwriter — buffer writes, checkpoints
pg_stat_user_tables — table-level scan/fetch/insert/update/delete counts

Enable pg_stat_statements (it's not on by default) — this single extension unlocks per-query performance data that's essential for finding slow queries.

MySQL / MariaDB Monitoring

Key MySQL monitoring queries:

-- Active connections and queries
SHOW PROCESSLIST;

-- Global status counters
SHOW GLOBAL STATUS LIKE 'Threads_connected';
SHOW GLOBAL STATUS LIKE 'Slow_queries';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_read_requests';
SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_reads';
-- Cache hit rate = 1 - (Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests)

-- Replication lag (on replica)
SHOW REPLICA STATUS\G
-- Look for Seconds_Behind_Source

MongoDB Monitoring

MongoDB key metrics differ from relational databases:

Current operations (db.currentOp()) — long-running operations in progress
WiredTiger cache usage — MongoDB's storage engine cache hit rate
Replication oplog window — how much time the oplog covers; if a secondary falls behind beyond the oplog window, it needs a full resync
Index miss rate — queries performing full collection scans instead of using indexes

Redis Monitoring

Redis monitoring focuses on memory and throughput:

Memory usage vs maxmemory — when Redis hits maxmemory, it evicts keys (or rejects writes, depending on policy)
Keyspace hit/miss rate — cache miss rate should stay below 5-10% for a healthy cache layer
Connected clients — Redis has a client limit (default 10,000)
Commands/second — throughput; alert on sudden drops (may indicate connection issues)
Replication offset lag — distance between master and replica in bytes

📡

Recommended

Monitor your database with Better Stack

Better Stack integrates with all major databases and cloud providers. Get unified alerting across your databases, APIs, and infrastructure in one dashboard.

Try Better Stack Free →

Best Database Monitoring Tools in 2026

For Managed Databases (RDS, Cloud SQL, Supabase)

Better Stack: Cross-service alerting — combine database metrics with API and uptime monitoring in a single dashboard. Integrates with RDS, Cloud SQL, and Supabase via CloudWatch/log forwarding.
DataDog: Deep RDS and CloudSQL integrations. Per-query performance tracking, anomaly detection, and correlation with application traces.
Cloud provider native: AWS CloudWatch for RDS, Google Cloud Monitoring for Cloud SQL. Good for basic metrics but limited for cross-service correlation.

For Self-Hosted Databases

Prometheus + Grafana: The most flexible open-source stack. Use database-specific exporters: postgres_exporter, mysqld_exporter, mongodb_exporter, redis_exporter.
pgBadger: PostgreSQL slow query log analyzer. Generates beautiful reports from your pg_log with query categorization and timing breakdowns.
Percona Monitoring and Management (PMM): Free, purpose-built for MySQL/MongoDB/PostgreSQL. Excellent slow query analysis.

For Query-Level Insights

New Relic APM: Traces individual application queries to database calls. Identifies the exact code path generating slow queries.
Scout APM: Lightweight APM focused on query performance. Good for Rails/Django/Laravel applications with heavy database usage.
Metabase: Business intelligence tool that doubles as a query analysis dashboard when pointed at your production database replica.

Setting Up Your Database Monitoring Stack

Step 1: Enable Slow Query Logging

This is the single highest-leverage action for database observability. Enable it everywhere, always.

# PostgreSQL (postgresql.conf)
log_min_duration_statement = 100  # Log queries > 100ms
log_statement = 'none'            # Don't log all statements

# MySQL (my.cnf)
slow_query_log = 1
slow_query_log_file = /var/log/mysql/slow.log
long_query_time = 0.1             # 100ms threshold

# MongoDB (mongod.conf)
operationProfiling:
  mode: slowOp
  slowOpThresholdMs: 100

Step 2: Set Up Connection Pool Monitoring

If you use PgBouncer, Vitess, or application-level pooling (HikariCP, pg-pool), monitor the pool itself — not just the database:

Pool size: total connections provisioned
Active: connections currently servicing a query
Idle: available connections waiting
Wait queue: requests waiting for a connection (should be zero at steady state)

Step 3: Create a Runbook for Each Alert

Every database alert should have a corresponding runbook. When your on-call engineer gets paged at 3am for "Connection pool at 92%", they should have a document explaining: what this means, what queries to run to diagnose it, and what actions to take (kill connections, scale the database, add replicas). See our runbook guide for templates.

Database Monitoring Checklist

☐Slow query logging enabled with threshold ≤ 100ms
☐pg_stat_statements or performance_schema enabled for query-level metrics
☐Connection pool monitoring — alert at 80%, page at 95%
☐Replication lag monitoring — alert > 10s, page > 60s
☐Disk usage alert at 75%, page at 90%
☐Deadlock rate monitoring — alert on any increase above baseline
☐Buffer cache hit rate alert below 90%
☐Backup verification — confirm backups completed within expected window
☐Runbook for every database alert type
☐Read replica health monitoring separate from primary

Frequently Asked Questions

What is database monitoring?

Database monitoring is the ongoing collection and analysis of metrics from your database system — including query performance, connection usage, replication lag, and resource utilization — to detect and resolve issues before they cause application downtime.

What are the most important database metrics to monitor?

The most critical metrics are: query response time (P95/P99), connection pool saturation, replication lag, slow query count, lock waits, disk I/O, and buffer cache hit rate. Track all seven and you'll catch the vast majority of database problems early.

How do I monitor PostgreSQL performance?

Enable pg_stat_statements and query pg_stat_activity, pg_stat_replication, and pg_stat_user_tables. Use pgBadger for slow query log analysis. Connect an APM tool (DataDog, New Relic) for automated metric collection and alerting.

What is a good alert threshold for database connection pool?

Alert at 80% pool utilization to give you time to respond. Page at 95%. Alert immediately if you see any waiting connections, as that indicates the pool is already exhausted.

What is the best database monitoring tool in 2026?

For managed databases, Better Stack or DataDog provide excellent cross-service visibility. For self-hosted databases, Prometheus + Grafana with database-specific exporters is the most flexible option. For query-level insights, your APM tool (New Relic, DataDog APM) is most valuable.

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your database goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your database + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Database Monitoring Guide: Metrics, Tools & Best Practices (2026)

🚨 The Hidden Database Failure Pattern