MySQL powers over 40% of the world's production databases. When MySQL slows down or goes down, your entire application follows. The challenge: MySQL exposes hundreds of status variables, and knowing which ones actually predict problems โ and which are noise โ takes real production experience. This guide cuts through the noise and gives you a practical MySQL monitoring playbook.
MySQL Status Variables: Your Monitoring Foundation
MySQL exposes performance data through status variables (cumulative counters since startup) and global variables (configuration). The primary commands:
-- Real-time status snapshot SHOW GLOBAL STATUS; -- Current connections and queries SHOW PROCESSLIST; -- InnoDB engine metrics SHOW ENGINE INNODB STATUS; -- Current configuration SHOW GLOBAL VARIABLES;
Status variables are cumulative โ to calculate rates (queries per second, errors per minute), you need to sample them at intervals and compute deltas. This is what monitoring tools like mysqld_exporter do automatically.
Critical MySQL Metrics to Monitor
Throughput and Traffic
| Metric | What It Measures | Alert Threshold |
|---|---|---|
| Questions/sec | Total queries per second (includes all query types) | Alert on sudden drop (traffic lost) or 2x spike (unexpected load) |
| Com_select/sec | SELECT queries per second | Baseline and alert on 3x deviation |
| Com_insert + Com_update + Com_delete | Write queries per second | Alert on sustained spike โ watch for runaway background jobs |
| Slow_queries (rate) | Queries exceeding long_query_time threshold | Alert on any increase โ indicates query regressions |
| Bytes_received + Bytes_sent | Network throughput in/out | Alert on saturation approaching NIC bandwidth |
Connection Pool Health
Connection exhaustion is one of the most common MySQL production failures. When Threads_connected hits max_connections, MySQL refuses new connections with Too many connections errors โ every app request fails immediately.
Connection Metrics (Alert These)
- Threads_connected: Current open connections โ alert at >80% of max_connections
- Connection_errors_max_connections: Connections refused โ alert on >0
- Max_used_connections: Peak connections since restart โ your high-water mark
- Threads_running: Actively executing queries โ >20 on a busy server indicates contention
- Aborted_connects: Failed connection attempts โ elevated values indicate network or auth issues
-- Check current connection usage vs max
SELECT
@@max_connections AS max_connections,
(SELECT COUNT(*) FROM information_schema.processlist) AS current_connections,
ROUND(
(SELECT COUNT(*) FROM information_schema.processlist) / @@max_connections * 100, 1
) AS pct_used;InnoDB Buffer Pool (The Most Important MySQL Metric)
The InnoDB buffer pool caches data and index pages in memory. When the buffer pool hit rate is high (>99%), most reads come from memory โ fast. When it drops, MySQL reads from disk โ slow. This single metric explains most MySQL performance degradation.
-- Calculate buffer pool hit rate
SELECT
(1 - Innodb_buffer_pool_reads / Innodb_buffer_pool_read_requests) * 100
AS buffer_pool_hit_rate_pct
FROM (
SELECT
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_reads') AS Innodb_buffer_pool_reads,
(SELECT VARIABLE_VALUE FROM performance_schema.global_status
WHERE VARIABLE_NAME = 'Innodb_buffer_pool_read_requests') AS Innodb_buffer_pool_read_requests
) t;InnoDB Buffer Pool Metrics
- Innodb_buffer_pool_read_requests: Logical read requests (from buffer pool)
- Innodb_buffer_pool_reads: Physical disk reads (cache miss) โ alert if >1% of requests
- Innodb_buffer_pool_pages_dirty: Pages modified but not flushed to disk
- Innodb_buffer_pool_wait_free: Non-zero means writes are waiting for buffer pool โ serious performance issue
- innodb_buffer_pool_size: Configuration โ set to 70-80% of available RAM for dedicated MySQL servers
Monitor MySQL with Better Stack โ alerts, on-call, and logs in one place
Better Stack integrates with MySQL, Prometheus, and your cloud provider's managed database metrics. Get on-call routing and incident management for your database layer.
Try Better Stack Free โLock Contention and Waits
InnoDB row-level locking is usually efficient, but high-concurrency workloads can create lock contention. Deadlocks are automatically resolved by MySQL (one transaction is rolled back), but frequent deadlocks indicate application logic problems.
Lock Metrics
- Innodb_row_lock_waits: Lock wait count โ rising trend indicates contention
- Innodb_row_lock_time_avg: Average lock wait time in ms โ alert if >50ms
- Innodb_deadlocks: Deadlock count โ alert on any sustained rate >1/min
- Innodb_lock_wait_timeout: Configuration โ default 50s; lower to 5-10s for web apps to fail fast
Slow Query Monitoring
The MySQL slow query log records queries that take longer than long_query_time seconds. It's the most direct way to find queries degrading your application.
Enable and Configure the Slow Query Log
-- Enable slow query log (no restart needed)
SET GLOBAL slow_query_log = 'ON'; SET GLOBAL long_query_time = 1; -- Log queries > 1 second SET GLOBAL log_queries_not_using_indexes = 'ON'; -- Also log full table scans SET GLOBAL slow_query_log_file = '/var/log/mysql/slow.log';
For persistent configuration, add to my.cnf:
[mysqld] slow_query_log = 1 slow_query_log_file = /var/log/mysql/slow.log long_query_time = 1 log_queries_not_using_indexes = 1 log_slow_replica_statements = 1
Analyzing Slow Queries
Option 1: mysqldumpslow (built-in)
# Top 10 slowest queries by total execution time mysqldumpslow -s t -t 10 /var/log/mysql/slow.log # Top 10 by count (most frequently slow) mysqldumpslow -s c -t 10 /var/log/mysql/slow.log
Option 2: pt-query-digest (Percona Toolkit โ recommended)
# Install Percona Toolkit sudo apt install percona-toolkit # Analyze slow query log with full stats pt-query-digest /var/log/mysql/slow.log # Output: ranked by total time, with avg/max/stddev per query fingerprint
pt-query-digest groups queries by fingerprint (normalizes literal values) and gives you total time, average time, call count, and worst-case per unique query pattern โ much more actionable than mysqldumpslow.
Option 3: Performance Schema (no log parsing)
-- Top 10 slowest queries by total execution time SELECT DIGEST_TEXT, COUNT_STAR AS exec_count, ROUND(SUM_TIMER_WAIT / 1000000000000, 2) AS total_exec_sec, ROUND(AVG_TIMER_WAIT / 1000000000000, 4) AS avg_exec_sec, ROUND(MAX_TIMER_WAIT / 1000000000000, 4) AS max_exec_sec FROM performance_schema.events_statements_summary_by_digest ORDER BY SUM_TIMER_WAIT DESC LIMIT 10;
MySQL Replication Monitoring
MySQL replication lag is a leading indicator for replica promotion failures and stale reads. Monitor replication health continuously โ silent replication failures have caused major data incidents.
-- Check replication status (run on replica)
-- MySQL 8.0+ SHOW REPLICA STATUSG -- MySQL < 8.0 SHOW SLAVE STATUSG -- Key fields to check: -- Replica_IO_Running: Yes (must be Yes) -- Replica_SQL_Running: Yes (must be Yes) -- Seconds_Behind_Source: 0 (lag in seconds โ alert > 30) -- Last_SQL_Error: (empty = healthy) -- Last_IO_Error: (empty = healthy)
Replication Alerts (Critical)
- Replica_IO_Running = No: Replication thread stopped โ data is no longer flowing from primary
- Replica_SQL_Running = No: SQL thread stopped โ often due to a replication error (check Last_SQL_Error)
- Seconds_Behind_Source > 60: Replica is falling behind โ failover will cause data loss equal to lag
- Seconds_Behind_Source = NULL: Replication is not running โ highest severity alert
Setting Up MySQL Monitoring with Prometheus
The standard open-source stack for MySQL monitoring is Prometheus + mysqld_exporter + Grafana. The mysqld_exporter connects to MySQL and exposes hundreds of status variables as Prometheus metrics.
Install and Configure mysqld_exporter
# 1. Create MySQL monitoring user
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'strong_password' WITH MAX_USER_CONNECTIONS 3; GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost'; FLUSH PRIVILEGES;
# 2. Install mysqld_exporter
wget https://github.com/prometheus/mysqld_exporter/releases/latest/download/mysqld_exporter-*.linux-amd64.tar.gz tar xvf mysqld_exporter-*.linux-amd64.tar.gz sudo mv mysqld_exporter /usr/local/bin/ # Create credentials file cat > /etc/.mysqld_exporter.cnf << EOF [client] user=exporter password=strong_password host=localhost EOF
# 3. Run exporter (or add to systemd)
mysqld_exporter --config.my-cnf=/etc/.mysqld_exporter.cnf --collect.info_schema.processlist --collect.info_schema.innodb_metrics --collect.global_status --collect.global_variables --collect.slave_status
Key Prometheus Alert Rules
groups:
- name: mysql
rules:
- alert: MySQLDown
expr: mysql_up == 0
for: 0m
labels:
severity: critical
annotations:
summary: "MySQL instance {{ $labels.instance }} is down"
- alert: MySQLTooManyConnections
expr: max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections > 0.8
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL connections at {{ $value | humanizePercentage }} of max"
- alert: MySQLHighQPS
expr: rate(mysql_global_status_questions[2m]) > 2 * avg_over_time(rate(mysql_global_status_questions[2m])[1h:5m])
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL QPS is 2x above baseline"
- alert: MySQLSlowQueries
expr: rate(mysql_global_status_slow_queries[5m]) > 0
for: 2m
labels:
severity: warning
annotations:
summary: "MySQL slow queries detected: {{ $value }} per second"
- alert: MySQLReplicationLag
expr: mysql_slave_status_seconds_behind_master > 30
for: 1m
labels:
severity: critical
annotations:
summary: "MySQL replication lag is {{ $value }}s on {{ $labels.instance }}"
- alert: MySQLReplicationNotRunning
expr: mysql_slave_status_slave_sql_running != 1 or mysql_slave_status_slave_io_running != 1
for: 0m
labels:
severity: critical
annotations:
summary: "MySQL replication stopped on {{ $labels.instance }}"
- alert: MySQLBufferPoolHitRate
expr: (1 - rate(mysql_global_status_innodb_buffer_pool_reads[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests[5m])) < 0.95
for: 5m
labels:
severity: warning
annotations:
summary: "MySQL buffer pool hit rate is {{ $value | humanizePercentage }} (below 95%)"
Alert Pro
14-day free trialStop checking โ get alerted instantly
Next time MySQL goes down, you'll know in under 60 seconds โ not when your users start complaining.
- Email alerts for MySQL + 9 more APIs
- $0 due today for trial
- Cancel anytime โ $9/mo after trial
MySQL Monitoring Tools Comparison
| Tool | Type | Best For | Cost |
|---|---|---|---|
| Prometheus + Grafana | Open source | Self-managed MySQL on VMs/K8s | Free (infra cost) |
| Percona PMM | Open source | On-prem MySQL with deep query analytics | Free (self-hosted) |
| Datadog | SaaS | Full-stack visibility, teams already on Datadog | $15-23/host/month |
| New Relic | SaaS | APM + database in one platform | Free tier; pay by GB |
| Better Stack | SaaS | MySQL uptime + on-call alerting | Free tier; $20+/month |
| AWS CloudWatch | Managed | RDS/Aurora monitoring on AWS | Included with RDS |
| Azure Monitor | Managed | Azure Database for MySQL | Included with Azure DB |
MySQL Monitoring Best Practices
1. Set max_connections Correctly
The default max_connections = 151 is too low for most apps. Formula: max_connections โ (RAM_GB ร 200) for InnoDB. But more important: ensure your app uses a connection pool (PgBouncer equivalent for MySQL: ProxySQL or MySQL Router) so persistent connections don't exhaust the limit.
2. Monitor Binary Log Size on Primary
Binary logs grow until purged. If disk fills on the primary, MySQL stops โ which stops replication. Alert when the partition holding binary logs is >70% full. Set binlog_expire_logs_seconds (MySQL 8.0+) to auto-purge after a safe retention window (7 days is typical).
3. Use EXPLAIN on Every New Query
Run EXPLAIN SELECT ... on every new query before deploying to production. Look for type = ALL (full table scan) or rows > 1000 without an index โ these will be slow at scale. EXPLAIN ANALYZE (MySQL 8.0+) shows actual execution stats.
4. Alert on table_open_cache Misses
High Opened_tables rate with low Open_tables relative to table_open_cache means MySQL is constantly opening/closing table file descriptors. Increase table_open_cache if this is high.
5. Monitor Disk IOPS on the Data Directory
InnoDB stores its data files (ibdata1, *.ibd) on disk. When IOPS are saturated, all queries slow down. Monitor the I/O wait on the partition holding MySQL's data directory โ this is often the first indicator before query latency spikes.
6. Test Failover Regularly
Monitoring replication lag is only useful if you actually practice failover. Run a controlled failover drill quarterly โ promote a replica to primary, verify the app reconnects, then restore replication. A 30-second lag you never noticed becomes critical when you need to promote under incident pressure.
Get alerted before MySQL problems affect users
Better Stack monitors MySQL availability and integrates with Prometheus for metrics-based alerting. Set up on-call escalations so the right engineer gets paged every time.
Try Better Stack Free โMySQL Monitoring Quick Reference
Essential Queries
Connection usage: SHOW STATUS LIKE 'Threads_connected';
Running queries: SHOW PROCESSLIST;
Slow queries: SHOW STATUS LIKE 'Slow_queries';
InnoDB status: SHOW ENGINE INNODB STATUS\G
Replication lag: SHOW REPLICA STATUS\G
Table sizes: SELECT table_schema, table_name, ROUND((data_length + index_length) / 1024 / 1024, 2) AS size_mb FROM information_schema.tables ORDER BY size_mb DESC;
Related MySQL and Database Monitoring Resources
- Database Monitoring Guide โ PostgreSQL, Redis, MongoDB & More
- Redis Monitoring Guide โ Key Metrics and Alerting
- Kafka Monitoring Guide โ Brokers, Consumers & Lag
- AWS RDS Monitoring โ CloudWatch Metrics for MySQL on RDS
- Azure Monitoring Guide โ Azure Database for MySQL
- OpenTelemetry Guide โ Database Span Instrumentation
- Is MySQL Down? โ Diagnose MySQL Outages
๐ Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
โWe use SEMrush to track how our API status pages rank and catch site health issues early.โ