Redis is the world's most popular in-memory data store, used for caching, session storage, message queuing, and real-time leaderboards across millions of applications. When Redis goes down, the downstream effects are immediate: cache miss rates spike, API latency jumps, sessions get cleared, and queue-dependent features stall. Because Redis is often invisible until it fails, diagnosing outages quickly is a critical engineering skill.
Redis Down: Cloud vs. Self-Hosted Diagnosis
The first step is determining what kind of Redis deployment you're dealing with, because the diagnosis path is completely different.
| Deployment | First Check | Common Cause |
|---|---|---|
| Redis Cloud (Redis Ltd) | status.redis.io | Cloud provider outage, cluster failover |
| AWS ElastiCache | AWS Health Dashboard | AZ failure, maintenance event |
| Azure Cache for Redis | Azure Status (status.azure.com) | Regional outage, scale operations |
| GCP Memorystore | GCP Status Dashboard | Maintenance, zone failure |
| Self-Hosted (server) | redis-cli ping | Process crash, OOM killer, disk full |
How to Diagnose Self-Hosted Redis Down
Step 1: Test Connectivity with redis-cli
# Basic local ping
redis-cli ping
# Remote host with auth
redis-cli -h your-redis-host -p 6379 -a YOUR_PASSWORD ping
# With TLS (Redis 6+)
redis-cli -h your-redis-host -p 6380 --tls pingStep 2: Check if Redis Process is Running
# Check process
ps aux | grep redis-server
# Check systemd service status
sudo systemctl status redis-server
# Check port is listening
ss -tlnp | grep 6379
# or
netstat -tlnp | grep 6379Step 3: Read Redis Logs
# Default log location (Ubuntu/Debian)
sudo tail -100 /var/log/redis/redis-server.log
# Common error patterns:
# "Can't save in background: fork: Cannot allocate memory" โ OOM
# "MISCONF Redis is configured to save RDB snapshots" โ disk full
# "Warning: 32 bit instance detected" โ memory limit issue๐ก Monitor Redis uptime every 30 seconds โ get alerted in under a minute
Trusted by 100,000+ websites ยท Free tier available
Step 4: Check Redis INFO for Health Metrics
# Get comprehensive stats
redis-cli info server | grep -E "redis_version|uptime_in_seconds|os"
redis-cli info memory | grep -E "used_memory_human|maxmemory_human"
redis-cli info stats | grep -E "rejected_connections|evicted_keys"
redis-cli info replication | grep -E "role|connected_slaves"Set up Redis health monitoring in 2 minutes
Better Stack monitors your Redis instances every 30 seconds and alerts your on-call rotation the moment connectivity fails โ before your application starts throwing errors.
Try Better Stack Free โCommon Redis Failure Modes & Fixes
1. Connection Refused (ECONNREFUSED)
The most common Redis error. Causes:
- Redis process is not running โ
sudo systemctl start redis-server - Redis bound to localhost only โ edit
redis.conf: changebind 127.0.0.1tobind 0.0.0.0(with firewall protection) - Firewall blocking port 6379 โ open port in iptables/ufw/security group
- Wrong host/port in app config โ double-check Redis URL environment variable
2. Out of Memory (OOM)
Redis consumed all available memory. Symptoms: writes fail with OOM command not allowed.
# Check current memory usage
redis-cli info memory | grep used_memory_human
# Set maxmemory limit (prevent OOM)
redis-cli config set maxmemory 2gb
# Set eviction policy for cache use case
redis-cli config set maxmemory-policy allkeys-lru3. Redis Cluster Split-Brain
In Redis Cluster mode, network partition can cause a split-brain where multiple nodes think they're the primary. This manifests as inconsistent reads and writes across different application servers.
# Check cluster state
redis-cli cluster info | grep cluster_state
# Should return: cluster_state:ok
# Check node roles
redis-cli cluster nodes | awk '{print $1, $3, $8}'4. Persistence Blocking Writes (BGSAVE / AOF)
Redis' background save (BGSAVE) or AOF rewrite can block writes if disk is full or slow:
# Check if background save is in progress
redis-cli info persistence | grep rdb_bgsave_in_progress
# Manually trigger save (use carefully in production)
redis-cli bgsave
# Disable persistence temporarily if disk is the issue
redis-cli config set save ""
redis-cli config set appendonly noRestart Redis Safely
โ ๏ธ Before restarting: check persistence configuration
If Redis is used as a cache only (no persistence), a restart clears all data โ that's expected. If Redis holds persistent session data or queue messages, ensure RDB/AOF is configured before restarting to restore data.
# Systemd (recommended)
sudo systemctl restart redis-server
# Graceful shutdown (saves RDB snapshot)
redis-cli shutdown save
# Docker
docker restart redis-container-name
# Force restart if unresponsive
sudo kill -9 $(pidof redis-server)
sudo systemctl start redis-serverMonitoring Redis in Production
Redis failures are often silent โ your application just starts serving slower responses and higher error rates without a clear "Redis is down" error message. Production Redis monitoring should track:
- Connection availability โ ping every 30 seconds from outside the app
- Memory usage โ alert at 80% of maxmemory to prevent OOM
- Connected clients โ spike in rejected connections = near-maxclients limit
- Eviction rate โ sudden evictions indicate memory pressure
- Replication lag โ replica lag > 5 seconds warrants investigation
- Command latency โ p99 latency > 10ms for simple GET/SET indicates CPU or I/O contention
Alert Pro
14-day free trialStop checking โ get alerted instantly
Next time Redis goes down, you'll know in under 60 seconds โ not when your users start complaining.
- Email alerts for Redis + 9 more APIs
- $0 due today for trial
- Cancel anytime โ $9/mo after trial
๐ Tools We Use & Recommend
Tested across our own infrastructure monitoring 200+ APIs daily
Uptime Monitoring & Incident Management
Used by 100,000+ websites
Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.
โWe use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.โ
Secrets Management & Developer Security
Trusted by 150,000+ businesses
Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.
โAfter covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.โ
Automated Personal Data Removal
Removes data from 350+ brokers
Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.
โService outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.โ
AI Voice & Audio Generation
Used by 1M+ developers
Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.
โThe best AI voice API we've tested โ natural-sounding speech with low latency. Essential for any app adding voice features.โ
SEO & Site Performance Monitoring
Used by 10M+ marketers
Track your site health, uptime, search rankings, and competitor movements from one dashboard.
โWe use SEMrush to track how our API status pages rank and catch site health issues early.โ