Redis Monitoring Guide: Key Metrics, Alerts & Tools (2026)
Redis failures are silent killers. Memory fills up, evictions spike, your cache hit ratio tanks — and your database suddenly gets 10× the load. This guide covers the metrics that matter, how to instrument them, and how to set alerts before your application degrades.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you
TL;DR — Redis Monitoring Checklist
- ✅ Monitor
used_memoryvsmaxmemory— alert at 80% - ✅ Track
evicted_keys— any non-zero value is a problem - ✅ Calculate cache hit ratio:
hits / (hits + misses)— alert below 90% - ✅ Monitor replication lag — alert above 1MB offset gap
- ✅ Enable slow log (
slowlog-log-slower-than 10000) to catch bad commands - ✅ Watch
connected_clients— a spike usually means a connection leak - ✅ Use redis-exporter for Prometheus/Grafana integration
Why Redis Monitoring Is Different
Redis is single-threaded and lives in RAM. This makes it extremely fast — and extremely unforgiving when you get the configuration wrong. Unlike a database where a slow query just slows down that query, a Redis problem cascades:
Memory fills up → evictions start
Redis deletes cache keys to free memory. Your hit ratio drops.
Hit ratio drops → database gets slammed
Every cache miss hits the backing database. Load spikes 5-20×.
Database overwhelmed → application latency spikes
DB can't keep up, queries queue, response times blow up.
Application latency → timeout errors → users see failures
What started as Redis running out of memory ends as a user-visible outage.
Good Redis monitoring catches problems at step 1 — before the cascade. The goal is alerting on memory pressure and evictions, not waiting to see database CPU spike.
Core Redis Metrics
Memory Metrics
| Metric | Command | What It Means | Alert Threshold |
|---|---|---|---|
| used_memory | INFO memory | Bytes allocated by Redis for data | > 80% of maxmemory |
| used_memory_rss | INFO memory | Bytes allocated by OS (includes fragmentation) | rss/used > 1.5 (fragmentation) |
| evicted_keys | INFO stats | Keys deleted to free memory (cumulative) | Rate > 0/sec (any eviction) |
| mem_fragmentation_ratio | INFO memory | rss_mem / used_mem — overhead from fragmentation | > 1.5 warning, > 2.0 critical |
Cache Performance Metrics
| Metric | Command | What It Means | Target |
|---|---|---|---|
| keyspace_hits | INFO stats | Successful key lookups | As high as possible |
| keyspace_misses | INFO stats | Failed key lookups (key not in cache) | Alert if miss rate > 10% |
| hit_rate | Calculated | hits / (hits + misses) × 100 | Alert below 90% |
| expired_keys | INFO stats | Keys removed by TTL expiration (normal) | Normal — just monitor trend |
Connection & Throughput Metrics
| Metric | Command | What It Means | Alert Threshold |
|---|---|---|---|
| connected_clients | INFO clients | Active client connections right now | > maxclients × 0.9 |
| blocked_clients | INFO clients | Clients waiting on BLPOP/BRPOP/WAIT | Unexpected spike > baseline |
| instantaneous_ops_per_sec | INFO stats | Commands processed per second | Alert on 50%+ drop from baseline |
| rejected_connections | INFO stats | Connections rejected because maxclients reached | Any non-zero value |
Monitor Redis availability with Better Stack
Better Stack runs TCP and HTTP checks against your Redis endpoints from 30+ global locations. Get alerted in seconds when Redis becomes unreachable — before your cache miss rate explodes.
Try Better Stack Free →The INFO Command — Your First Diagnostic Tool
redis-cli INFO is the fastest way to see Redis health. Run it with a section name for focused output:
# Get all stats
redis-cli INFO
# Memory section only
redis-cli INFO memory
# Stats section (hits, misses, evictions, connections)
redis-cli INFO stats
# Replication section
redis-cli INFO replication
# Keyspace section (key counts per DB)
redis-cli INFO keyspace
# Example memory output:
# used_memory:1234567890
# used_memory_human:1.15G
# used_memory_rss:1456789012
# mem_fragmentation_ratio:1.18
# maxmemory:2147483648
# maxmemory_human:2.00G
# maxmemory_policy:allkeys-lruPro tip: Run redis-cli --stat for a live rolling view of ops/sec, used memory, keys, blocked clients, and requests every second — useful for watching trends in real time during an incident.
Prometheus Setup with redis-exporter
The oliver006/redis_exporter is the standard Prometheus exporter for Redis. It scrapes INFO and exposes 100+ metrics on port 9121.
# Docker Compose example
version: '3'
services:
redis-exporter:
image: oliver006/redis_exporter:v1.62
environment:
REDIS_ADDR: "redis://redis:6379"
REDIS_PASSWORD: "$REDIS_PASSWORD"
ports:
- "9121:9121"
depends_on:
- redis
# Prometheus scrape config
scrape_configs:
- job_name: 'redis'
static_configs:
- targets: ['localhost:9121']
scrape_interval: 15s
# Key metrics exposed:
# redis_memory_used_bytes
# redis_memory_max_bytes
# redis_keyspace_hits_total
# redis_keyspace_misses_total
# redis_evicted_keys_total
# redis_connected_clients
# redis_blocked_clients
# redis_replication_lag
# redis_up (1 = healthy)Essential Prometheus Alert Rules
groups:
- name: redis
rules:
# Redis is down
- alert: RedisDown
expr: redis_up == 0
for: 1m
labels:
severity: critical
annotations:
summary: "Redis instance is not responding"
# Memory pressure
- alert: RedisMemoryHigh
expr: |
redis_memory_used_bytes / redis_memory_max_bytes > 0.8
for: 5m
labels:
severity: warning
annotations:
summary: "Redis memory usage above 80%"
# Evictions happening (cache too small)
- alert: RedisEvictions
expr: rate(redis_evicted_keys_total[5m]) > 0
for: 5m
labels:
severity: warning
annotations:
summary: "Redis is evicting keys — cache may be undersized"
# Cache hit ratio below 90%
- alert: RedisLowHitRate
expr: |
rate(redis_keyspace_hits_total[5m]) /
(rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) < 0.9
for: 10m
labels:
severity: warning
annotations:
summary: "Redis cache hit rate below 90%"
# Connection limit approaching
- alert: RedisTooManyConnections
expr: redis_connected_clients > (redis_config_maxclients * 0.9)
for: 5m
labels:
severity: warning
annotations:
summary: "Redis approaching max client connections"
# Replication lag
- alert: RedisReplicationLag
expr: redis_replication_lag > 1048576 # 1MB
for: 5m
labels:
severity: warning
annotations:
summary: "Redis replica is lagging behind primary"Slow Query Log — Finding Bad Commands
Redis is single-threaded. One slow command blocks every other client. The slow log captures commands that exceed your threshold — it's your first stop when Redis latency spikes.
# Configure slow log (10ms threshold, keep 128 entries)
redis-cli CONFIG SET slowlog-log-slower-than 10000
redis-cli CONFIG SET slowlog-max-len 128
# View the 25 most recent slow commands
redis-cli SLOWLOG GET 25
# Output format:
# 1) 1) (integer) 14 # Entry ID
# 2) (integer) 1714500000 # Unix timestamp
# 3) (integer) 28000 # Execution time in microseconds (28ms)
# 4) 1) "KEYS" # Command + arguments
# 2) "*"
# 5) "127.0.0.1:42321"
# 6) ""
# Reset the slow log
redis-cli SLOWLOG RESETCommon Slow Command Offenders
| Command | Problem | Fix |
|---|---|---|
| KEYS * | Full keyspace scan — blocks all other commands | Use SCAN with cursor + COUNT |
| LRANGE key 0 -1 | Reading entire list (could be millions of items) | Use pagination: LRANGE key 0 99 |
| SMEMBERS key | Returns all members of a large set | Use SSCAN for large sets |
| SORT key | Sorting large lists is O(N+M log M) | Pre-sort at write time or use sorted sets (ZADD) |
| HGETALL key | Returns entire hash with hundreds of fields | Use HMGET for specific fields |
Never use KEYS in production. Even with 10,000 keys, KEYS * holds Redis for the entire scan. With 1M keys and a busy server, it can lock Redis for hundreds of milliseconds.
Replication Monitoring
In production Redis setups, you typically have one primary and one or more replicas. Monitoring replication health is critical — if a replica falls too far behind and the primary fails, you lose data.
# Check replication status on the primary
redis-cli INFO replication
# Key output fields:
# role: master
# connected_slaves: 2
# slave0: ip=10.0.1.5,port=6379,state=online,offset=1234567,lag=0
# slave1: ip=10.0.1.6,port=6379,state=online,offset=1234500,lag=0
# master_repl_offset: 1234570
# repl_backlog_size: 1048576
# Lag calculation:
# slave lag = master_repl_offset - slave_offset
# slave0 lag = 1234570 - 1234567 = 3 bytes (healthy)
# slave1 lag = 1234570 - 1234500 = 70 bytes (tiny, normal)
# Check on a replica
redis-cli -h replica-host INFO replication
# role: slave
# master_host: 10.0.1.4
# master_link_status: up # Should be "up"
# master_sync_in_progress: 0 # 1 = full resync in progress (expensive)Watch for full resyncs: If master_sync_in_progress: 1, a replica is doing a full resync — it loaded all data from scratch. This is expensive (transfers the full RDB snapshot). It happens when a replica reconnects after falling too far behind the replication backlog. Make the backlog larger (repl-backlog-size) to reduce this.
Alert Pro
14-day free trialStop checking — get alerted instantly
Next time your Redis-backed services goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for your Redis-backed services + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Redis Sentinel vs. Cluster Monitoring
Redis Sentinel
Monitors primary/replica pairs. Promotes a replica if primary fails.
Key checks:
- →
sentinel masters— list monitored masters - →
sentinel slaves <name>— replica health - →
sentinel sentinels <name>— quorum count - → Alert on
num-other-sentinels < 2(can't achieve quorum)
Redis Cluster
Shards data across multiple nodes. Built-in HA without Sentinel.
Key checks:
- →
CLUSTER INFO— cluster_state must be "ok" - →
cluster_slots_failmust be 0 - →
CLUSTER NODES— all nodes connected - → Alert if any shard has no healthy replica
Redis Monitoring Tools (2026)
| Tool | Type | Best For | Cost |
|---|---|---|---|
| redis-cli + INFO | Built-in CLI | Quick manual diagnostics, incident investigation | Free |
| oliver006/redis_exporter | Prometheus exporter | Teams already running Prometheus/Grafana | Free (OSS) |
| Better Stack | SaaS monitoring | TCP/HTTP monitoring + on-call alerting, fast setup | Free tier, $25/mo+ |
| Grafana Cloud | SaaS observability | Full metrics/logs/traces stack, pre-built Redis dashboards | Free tier (10k series), $8/mo+ |
| Datadog | Enterprise APM | Enterprises wanting Redis + app correlation | $15-23/host/mo |
| RedisInsight | Redis GUI | Visual key browser, slow log viewer, memory analysis | Free (by Redis Ltd) |
Frequently Asked Questions
What are the most important Redis metrics to monitor?
The six critical metrics: (1) used_memory vs maxmemory — alert at 80%, (2) evicted_keys rate — any non-zero value means your cache is too small, (3) cache hit ratio — alert below 90%, (4) connected_clients — spike indicates a connection leak, (5) replication_lag — alert above 1MB offset gap, (6) instantaneous_ops_per_sec — drop indicates Redis is struggling.
How do I check Redis memory usage?
Run redis-cli INFO memory. Focus on used_memory (actual data size), used_memory_rss (OS-level allocation including fragmentation), and mem_fragmentation_ratio. A ratio above 1.5 means fragmentation overhead — consider MEMORY PURGE or restart during a maintenance window.
What is a good Redis cache hit ratio?
Aim for 90%+ for most caching workloads. Calculate as: keyspace_hits / (keyspace_hits + keyspace_misses). Below 80% usually means keys are expiring too aggressively, your cache is undersized (evictions before TTL), or keys are being written but never looked up.
How do I monitor Redis replication lag?
Run INFO replication on the primary. Each replica shows its offset — subtract from master_repl_offset to get lag in bytes. Alert when lag exceeds 1MB. Also monitor master_link_status on replicas — "down" means the replica is disconnected.
How do I find slow queries in Redis?
Enable the slow log: CONFIG SET slowlog-log-slower-than 10000 (10ms). Then run SLOWLOG GET 25. The most common offenders: KEYS * (full scan — never use in production), LRANGE on giant lists, SMEMBERS on huge sets. Replace with SCAN, SSCAN, or HSCAN respectively.
What should I set as my Redis eviction policy?
For pure caching: allkeys-lru. For mixed data (some persistent, some cached): volatile-lru. For data that must never be evicted (queues, counters): noeviction with very aggressive memory alerts. Configure with CONFIG SET maxmemory-policy allkeys-lru. Monitor evicted_keys — any eviction means your cache is undersized.
Related Monitoring Guides
🛠 Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
“We use SEMrush to track how our API status pages rank and catch site health issues early.”