Database MonitoringRedis2026 Guide

Redis Monitoring Guide: Key Metrics, Alerts & Tools (2026)

Q: What are the most important Redis metrics to monitor?

The six most critical Redis metrics are: (1) used_memory vs maxmemory — how close you are to running out of RAM, (2) evicted_keys — keys being deleted because memory is full, (3) keyspace_hits vs keyspace_misses — cache hit ratio, (4) connected_clients — active client connections, (5) replication_lag — how far replicas are behind the primary, (6) instantaneous_ops_per_sec — commands processed per second. Track all six together; a high eviction rate combined with a dropping hit ratio is a clear sign Redis is undersized for your workload.

Q: How do I check Redis memory usage?

Run `redis-cli INFO memory` to see used_memory, used_memory_rss (OS allocation), maxmemory (your configured limit), and mem_fragmentation_ratio. A fragmentation ratio above 1.5 means Redis is using 50% more RAM than your keys actually need — usually caused by deleted keys leaving fragmented allocations. Use `MEMORY DOCTOR` for an automated diagnosis. Alert when used_memory exceeds 80% of maxmemory to get ahead of evictions.

Q: What is a good Redis cache hit ratio?

A healthy cache hit ratio is 90% or higher for most use cases. Calculate it as: keyspace_hits / (keyspace_hits + keyspace_misses) × 100. A ratio below 80% often means your keys are expiring too aggressively, your cache is too small (causing evictions before keys expire), or your application is looking up keys that were never written. Session caches typically hit 95%+; write-through caches may be lower by design.

Q: How do I monitor Redis replication lag?

Check replication status with `redis-cli INFO replication`. The master_repl_offset shows how many bytes the primary has written; each replica reports its offset. Lag = master_offset - replica_offset. In Redis Sentinel or Cluster mode, also monitor replica_announces and node connectivity. Alert when replication lag exceeds 1MB (indicates the replica is falling behind) or when a replica shows as disconnected. High lag during write bursts is normal; persistent lag indicates network or disk I/O issues.

Q: How do I find slow queries in Redis?

Redis has a built-in slow log. Configure it with `CONFIG SET slowlog-log-slower-than 10000` (microseconds — 10ms threshold) and `CONFIG SET slowlog-max-len 128`. Then view slow queries with `SLOWLOG GET 25`. The slow log captures command, execution time, and arguments. The most common culprits: KEYS * (full keyspace scan), unindexed SCAN patterns, large LRANGE or SMEMBERS calls on big data structures, and SORT commands. Always use SCAN instead of KEYS in production.

Q: What should I set as my Redis eviction policy?

For session/cache workloads: allkeys-lru (evict any key using LRU) is the safest. If you have a mix of persistent and cache keys: volatile-lru (only evict keys with TTL set). For rate limiting or counters that must survive: noeviction (reject writes when full) — but alert loudly when approaching maxmemory. Avoid allkeys-random or volatile-random unless your access pattern is truly uniform. Configure with `CONFIG SET maxmemory-policy allkeys-lru`. Monitor evicted_keys counter — any non-zero value means your cache is too small.

Redis failures are silent killers. Memory fills up, evictions spike, your cache hit ratio tanks — and your database suddenly gets 10× the load. This guide covers the metrics that matter, how to instrument them, and how to set alerts before your application degrades.

Updated April 2026•14 min read•SRE / Backend Engineering

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Redis Monitoring Checklist

✅ Monitor used_memory vs maxmemory — alert at 80%
✅ Track evicted_keys — any non-zero value is a problem
✅ Calculate cache hit ratio: hits / (hits + misses) — alert below 90%
✅ Monitor replication lag — alert above 1MB offset gap
✅ Enable slow log (slowlog-log-slower-than 10000) to catch bad commands
✅ Watch connected_clients — a spike usually means a connection leak
✅ Use redis-exporter for Prometheus/Grafana integration

Why Redis Monitoring Is Different

Redis is single-threaded and lives in RAM. This makes it extremely fast — and extremely unforgiving when you get the configuration wrong. Unlike a database where a slow query just slows down that query, a Redis problem cascades:

Memory fills up → evictions start

Redis deletes cache keys to free memory. Your hit ratio drops.

Hit ratio drops → database gets slammed

Every cache miss hits the backing database. Load spikes 5-20×.

Database overwhelmed → application latency spikes

DB can't keep up, queries queue, response times blow up.

Application latency → timeout errors → users see failures

What started as Redis running out of memory ends as a user-visible outage.

Good Redis monitoring catches problems at step 1 — before the cascade. The goal is alerting on memory pressure and evictions, not waiting to see database CPU spike.

Core Redis Metrics

Memory Metrics

Metric	Command	What It Means	Alert Threshold
used_memory	INFO memory	Bytes allocated by Redis for data	> 80% of maxmemory
used_memory_rss	INFO memory	Bytes allocated by OS (includes fragmentation)	rss/used > 1.5 (fragmentation)
evicted_keys	INFO stats	Keys deleted to free memory (cumulative)	Rate > 0/sec (any eviction)
mem_fragmentation_ratio	INFO memory	rss_mem / used_mem — overhead from fragmentation	> 1.5 warning, > 2.0 critical

Cache Performance Metrics

Metric	Command	What It Means	Target
keyspace_hits	INFO stats	Successful key lookups	As high as possible
keyspace_misses	INFO stats	Failed key lookups (key not in cache)	Alert if miss rate > 10%
hit_rate	Calculated	hits / (hits + misses) × 100	Alert below 90%
expired_keys	INFO stats	Keys removed by TTL expiration (normal)	Normal — just monitor trend

Connection & Throughput Metrics

Metric	Command	What It Means	Alert Threshold
connected_clients	INFO clients	Active client connections right now	> maxclients × 0.9
blocked_clients	INFO clients	Clients waiting on BLPOP/BRPOP/WAIT	Unexpected spike > baseline
instantaneous_ops_per_sec	INFO stats	Commands processed per second	Alert on 50%+ drop from baseline
rejected_connections	INFO stats	Connections rejected because maxclients reached	Any non-zero value

📡

Recommended

Monitor Redis availability with Better Stack

Better Stack runs TCP and HTTP checks against your Redis endpoints from 30+ global locations. Get alerted in seconds when Redis becomes unreachable — before your cache miss rate explodes.

Try Better Stack Free →

The INFO Command — Your First Diagnostic Tool

redis-cli INFO is the fastest way to see Redis health. Run it with a section name for focused output:

# Get all stats
redis-cli INFO

# Memory section only
redis-cli INFO memory

# Stats section (hits, misses, evictions, connections)
redis-cli INFO stats

# Replication section
redis-cli INFO replication

# Keyspace section (key counts per DB)
redis-cli INFO keyspace

# Example memory output:
# used_memory:1234567890
# used_memory_human:1.15G
# used_memory_rss:1456789012
# mem_fragmentation_ratio:1.18
# maxmemory:2147483648
# maxmemory_human:2.00G
# maxmemory_policy:allkeys-lru

Pro tip: Run redis-cli --stat for a live rolling view of ops/sec, used memory, keys, blocked clients, and requests every second — useful for watching trends in real time during an incident.

Prometheus Setup with redis-exporter

The oliver006/redis_exporter is the standard Prometheus exporter for Redis. It scrapes INFO and exposes 100+ metrics on port 9121.

# Docker Compose example
version: '3'
services:
  redis-exporter:
    image: oliver006/redis_exporter:v1.62
    environment:
      REDIS_ADDR: "redis://redis:6379"
      REDIS_PASSWORD: "$REDIS_PASSWORD"
    ports:
      - "9121:9121"
    depends_on:
      - redis

# Prometheus scrape config
scrape_configs:
  - job_name: 'redis'
    static_configs:
      - targets: ['localhost:9121']
    scrape_interval: 15s

# Key metrics exposed:
# redis_memory_used_bytes
# redis_memory_max_bytes
# redis_keyspace_hits_total
# redis_keyspace_misses_total
# redis_evicted_keys_total
# redis_connected_clients
# redis_blocked_clients
# redis_replication_lag
# redis_up (1 = healthy)

Essential Prometheus Alert Rules

groups:
  - name: redis
    rules:
      # Redis is down
      - alert: RedisDown
        expr: redis_up == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Redis instance is not responding"

      # Memory pressure
      - alert: RedisMemoryHigh
        expr: |
          redis_memory_used_bytes / redis_memory_max_bytes > 0.8
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Redis memory usage above 80%"

      # Evictions happening (cache too small)
      - alert: RedisEvictions
        expr: rate(redis_evicted_keys_total[5m]) > 0
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Redis is evicting keys — cache may be undersized"

      # Cache hit ratio below 90%
      - alert: RedisLowHitRate
        expr: |
          rate(redis_keyspace_hits_total[5m]) /
          (rate(redis_keyspace_hits_total[5m]) + rate(redis_keyspace_misses_total[5m])) < 0.9
        for: 10m
        labels:
          severity: warning
        annotations:
          summary: "Redis cache hit rate below 90%"

      # Connection limit approaching
      - alert: RedisTooManyConnections
        expr: redis_connected_clients > (redis_config_maxclients * 0.9)
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Redis approaching max client connections"

      # Replication lag
      - alert: RedisReplicationLag
        expr: redis_replication_lag > 1048576  # 1MB
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "Redis replica is lagging behind primary"

Slow Query Log — Finding Bad Commands

Redis is single-threaded. One slow command blocks every other client. The slow log captures commands that exceed your threshold — it's your first stop when Redis latency spikes.

# Configure slow log (10ms threshold, keep 128 entries)
redis-cli CONFIG SET slowlog-log-slower-than 10000
redis-cli CONFIG SET slowlog-max-len 128

# View the 25 most recent slow commands
redis-cli SLOWLOG GET 25

# Output format:
# 1) 1) (integer) 14          # Entry ID
#    2) (integer) 1714500000  # Unix timestamp
#    3) (integer) 28000       # Execution time in microseconds (28ms)
#    4) 1) "KEYS"             # Command + arguments
#       2) "*"
#    5) "127.0.0.1:42321"
#    6) ""

# Reset the slow log
redis-cli SLOWLOG RESET

Common Slow Command Offenders

Command	Problem	Fix
KEYS *	Full keyspace scan — blocks all other commands	Use `SCAN` with cursor + COUNT
LRANGE key 0 -1	Reading entire list (could be millions of items)	Use pagination: `LRANGE key 0 99`
SMEMBERS key	Returns all members of a large set	Use `SSCAN` for large sets
SORT key	Sorting large lists is O(N+M log M)	Pre-sort at write time or use sorted sets (ZADD)
HGETALL key	Returns entire hash with hundreds of fields	Use `HMGET` for specific fields

Never use KEYS in production. Even with 10,000 keys, KEYS * holds Redis for the entire scan. With 1M keys and a busy server, it can lock Redis for hundreds of milliseconds.

Replication Monitoring

In production Redis setups, you typically have one primary and one or more replicas. Monitoring replication health is critical — if a replica falls too far behind and the primary fails, you lose data.

# Check replication status on the primary
redis-cli INFO replication

# Key output fields:
# role: master
# connected_slaves: 2
# slave0: ip=10.0.1.5,port=6379,state=online,offset=1234567,lag=0
# slave1: ip=10.0.1.6,port=6379,state=online,offset=1234500,lag=0
# master_repl_offset: 1234570
# repl_backlog_size: 1048576

# Lag calculation:
# slave lag = master_repl_offset - slave_offset
# slave0 lag = 1234570 - 1234567 = 3 bytes (healthy)
# slave1 lag = 1234570 - 1234500 = 70 bytes (tiny, normal)

# Check on a replica
redis-cli -h replica-host INFO replication
# role: slave
# master_host: 10.0.1.4
# master_link_status: up   # Should be "up"
# master_sync_in_progress: 0  # 1 = full resync in progress (expensive)

Watch for full resyncs: If master_sync_in_progress: 1, a replica is doing a full resync — it loaded all data from scratch. This is expensive (transfers the full RDB snapshot). It happens when a replica reconnects after falling too far behind the replication backlog. Make the backlog larger (repl-backlog-size) to reduce this.

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your Redis-backed services goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your Redis-backed services + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Redis Sentinel vs. Cluster Monitoring

Redis Sentinel

Monitors primary/replica pairs. Promotes a replica if primary fails.

Key checks:

→ sentinel masters — list monitored masters
→ sentinel slaves <name> — replica health
→ sentinel sentinels <name> — quorum count
→ Alert on num-other-sentinels < 2 (can't achieve quorum)

Redis Cluster

Shards data across multiple nodes. Built-in HA without Sentinel.

Key checks:

→ CLUSTER INFO — cluster_state must be "ok"
→ cluster_slots_fail must be 0
→ CLUSTER NODES — all nodes connected
→ Alert if any shard has no healthy replica

Redis Monitoring Tools (2026)

Tool	Type	Best For	Cost
redis-cli + INFO	Built-in CLI	Quick manual diagnostics, incident investigation	Free
oliver006/redis_exporter	Prometheus exporter	Teams already running Prometheus/Grafana	Free (OSS)
Better Stack	SaaS monitoring	TCP/HTTP monitoring + on-call alerting, fast setup	Free tier, $25/mo+
Grafana Cloud	SaaS observability	Full metrics/logs/traces stack, pre-built Redis dashboards	Free tier (10k series), $8/mo+
Datadog	Enterprise APM	Enterprises wanting Redis + app correlation	$15-23/host/mo
RedisInsight	Redis GUI	Visual key browser, slow log viewer, memory analysis	Free (by Redis Ltd)

Frequently Asked Questions

What are the most important Redis metrics to monitor?

The six critical metrics: (1) used_memory vs maxmemory — alert at 80%, (2) evicted_keys rate — any non-zero value means your cache is too small, (3) cache hit ratio — alert below 90%, (4) connected_clients — spike indicates a connection leak, (5) replication_lag — alert above 1MB offset gap, (6) instantaneous_ops_per_sec — drop indicates Redis is struggling.

How do I check Redis memory usage?

Run redis-cli INFO memory. Focus on used_memory (actual data size), used_memory_rss (OS-level allocation including fragmentation), and mem_fragmentation_ratio. A ratio above 1.5 means fragmentation overhead — consider MEMORY PURGE or restart during a maintenance window.

What is a good Redis cache hit ratio?

Aim for 90%+ for most caching workloads. Calculate as: keyspace_hits / (keyspace_hits + keyspace_misses). Below 80% usually means keys are expiring too aggressively, your cache is undersized (evictions before TTL), or keys are being written but never looked up.

How do I monitor Redis replication lag?

Run INFO replication on the primary. Each replica shows its offset — subtract from master_repl_offset to get lag in bytes. Alert when lag exceeds 1MB. Also monitor master_link_status on replicas — "down" means the replica is disconnected.

How do I find slow queries in Redis?

Enable the slow log: CONFIG SET slowlog-log-slower-than 10000 (10ms). Then run SLOWLOG GET 25. The most common offenders: KEYS * (full scan — never use in production), LRANGE on giant lists, SMEMBERS on huge sets. Replace with SCAN, SSCAN, or HSCAN respectively.

What should I set as my Redis eviction policy?

For pure caching: allkeys-lru. For mixed data (some persistent, some cached): volatile-lru. For data that must never be evicted (queues, counters): noeviction with very aggressive memory alerts. Configure with CONFIG SET maxmemory-policy allkeys-lru. Monitor evicted_keys — any eviction means your cache is undersized.

Related Monitoring Guides

→ Database Monitoring Guide → Kubernetes Monitoring Guide → Is Redis Down? Status & Troubleshooting → Best APM Tools 2026 → Distributed Tracing Guide → Node.js Monitoring Guide

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you