Elasticsearch powers search and analytics for thousands of production applications โ from e-commerce search to log management and security analytics. When Elasticsearch goes down or a cluster enters a red state, search functionality fails, log ingestion stops, and application errors cascade. Understanding Elasticsearch's health model is the key to fast diagnosis and recovery.
Elasticsearch Cluster Status: Green, Yellow, Red Explained
Elasticsearch reports cluster health in three states:
Green โ All Shards Assigned
All primary and replica shards are successfully allocated to nodes. The cluster is fully operational and redundant.
Yellow โ Replicas Unassigned
All primary shards are assigned but some replica shards are not. Data is accessible (no data loss) but the cluster has no redundancy โ a node failure could cause data loss.
Red โ Primary Shards Unassigned
One or more primary shards are unassigned. Searches against affected indices fail. Data in those shards is inaccessible until the primary is recovered or restored from replica/snapshot.
How to Diagnose Elasticsearch Being Down
1. Check the Cluster Health API
# Basic health check
curl -s "http://localhost:9200/_cluster/health?pretty"
# Output includes:
# - status: green/yellow/red
# - number_of_nodes: total nodes
# - unassigned_shards: shards needing recovery
# - active_primary_shards: primary shards serving data
# For Elastic Cloud (requires authentication)
curl -s -u username:password \
"https://YOUR_CLUSTER.es.io/_cluster/health?pretty"2. Check Node Availability
# List all nodes (shows which are online)
curl -s "http://localhost:9200/_cat/nodes?v&h=name,ip,heap.percent,disk.used_percent,load_1m,node.role"
# Check if master node is elected
curl -s "http://localhost:9200/_cat/master?v"๐ก Monitor Elasticsearch uptime every 30 seconds โ get alerted in under a minute
Trusted by 100,000+ websites ยท Free tier available
3. Diagnose Unassigned Shards
# List unassigned shards with reasons
curl -s "http://localhost:9200/_cat/shards?h=index,shard,prirep,state,unassigned.reason&s=state"
# Get detailed allocation explanation
curl -s -XGET "http://localhost:9200/_cluster/allocation/explain?pretty" \
-H 'Content-Type: application/json' \
-d '{"index": "your-index", "shard": 0, "primary": true}'Monitor Elasticsearch cluster health automatically
Better Stack can monitor your Elasticsearch cluster health endpoint every minute and alert your team the moment it goes red โ before your search goes down in production.
Try Better Stack Free โCommon Elasticsearch Outage Causes and Fixes
Out of Memory (JVM Heap Exhaustion)
Elasticsearch is Java-based and requires adequate heap memory. Symptoms: OOM errors in logs, nodes crashing under load.
# Check heap usage
curl -s "http://localhost:9200/_nodes/stats/jvm?pretty" | \
jq '.nodes | to_entries[] | {name: .key, heap_used_percent: .value.jvm.mem.heap_used_percent}'
# Fix: Increase heap in jvm.options (set Xms and Xmx to same value, max 50% of RAM)
# -Xms8g
# -Xmx8gDisk Full โ Indices in Read-Only Mode
When disk usage exceeds 90% (flood_stage watermark), Elasticsearch automatically makes all indices read-only to prevent data corruption.
# Check disk usage per node
curl -s "http://localhost:9200/_cat/allocation?v"
# Check for read-only indices
curl -s "http://localhost:9200/_cat/indices?v&h=index,status,health" | grep -E "open|close"
# Remove read-only lock after freeing disk space
curl -XPUT "http://localhost:9200/_all/_settings" \
-H 'Content-Type: application/json' \
-d '{"index.blocks.read_only_allow_delete": null}'Split Brain / Master Election Failure
A network partition can cause multiple nodes to think they're the master (pre-Elasticsearch 7.x). Symptoms: cluster health API unreachable, conflicting master nodes.
- Ensure
discovery.zen.minimum_master_nodesis set to โ(N/2)+1โ (quorum) - Elasticsearch 7+ uses Raft-based consensus โ split brain is largely eliminated
- Check cluster UUID consistency across nodes to detect split brain
Elasticsearch vs. Elastic Cloud: Different Failure Modes
| Deployment Type | Status Source | Who Manages Recovery |
|---|---|---|
| Self-hosted | Your own /_cluster/health API | Your engineering team |
| Elastic Cloud | cloud-status.elastic.co | Elastic SRE team |
| AWS OpenSearch | health.aws.amazon.com | AWS (managed) + you (config) |
Frequently Asked Questions
How do I restart Elasticsearch safely?
Before restarting, disable shard allocation to prevent unnecessary shard movement: PUT /_cluster/settings with "persistent": {"cluster.routing.allocation.enable": "primaries"}. Then restart one node at a time, wait for it to rejoin the cluster before restarting the next.
What is the difference between Elasticsearch and OpenSearch?
OpenSearch is an open-source fork of Elasticsearch created by AWS in 2021 after Elastic changed its license. Both share the same core API, making migration feasible. AWS OpenSearch is the managed version available in AWS; Elastic Cloud is the official managed Elasticsearch from Elastic.
How do I check Elasticsearch logs for errors?
Elasticsearch logs are typically at /var/log/elasticsearch/[cluster-name].log (Linux). Look for OutOfMemoryError, NoShardAvailableException, ClusterBlockException, or NodeDisconnectedException as indicators of failure causes.
Why is my Elasticsearch slow instead of down?
Elasticsearch performance degradation (not full outage) is often caused by: high JVM garbage collection frequency, hot shards with unbalanced indexing load, too many open shards per node, or running search and indexing concurrently on undersized nodes. Use /_nodes/hot_threads to identify CPU bottlenecks.
Alert Pro
14-day free trialStop checking โ get alerted instantly
Next time Elasticsearch goes down, you'll know in under 60 seconds โ not when your users start complaining.
- Email alerts for Elasticsearch + 9 more APIs
- $0 due today for trial
- Cancel anytime โ $9/mo after trial