Is New Relic Down? How to Check New Relic Status in Real-Time

Is New Relic Down? How to Check New Relic Status in Real-Time

Quick Answer: To check if New Relic is down, visit apistatuscheck.com/api/new-relic for real-time monitoring, or check the official status.newrelic.com page. Common signs include agent connectivity failures, data ingestion lag, NRQL query timeouts, missing metrics, alert condition failures, and Synthetics monitor issues.

When your observability platform goes dark, you're flying blind. New Relic monitors your entire infrastructure, applications, and business metrics—making any downtime a critical incident. Whether you're seeing agents disconnected, queries timing out, or alerts failing to fire, knowing how to quickly verify New Relic's status can mean the difference between rapid incident resolution and hours of misdirected troubleshooting.

How to Check New Relic Status in Real-Time

1. API Status Check (Fastest Method)

The fastest way to verify New Relic's operational status is through apistatuscheck.com/api/new-relic. This real-time monitoring service:

  • Tests actual API endpoints every 60 seconds
  • Monitors data ingestion and query performance
  • Tracks historical uptime over 30/60/90 days
  • Provides instant alerts when issues are detected
  • Monitors multiple regions (US, EU)

Unlike status pages that rely on manual updates, API Status Check performs active health checks against New Relic's production endpoints, including GraphQL API, REST API, and data ingestion pipelines, giving you the most accurate real-time picture of service availability.

2. Official New Relic Status Page

New Relic maintains status.newrelic.com as their official communication channel for service incidents. The page displays:

  • Current operational status for all products
  • Active incidents and investigations
  • Scheduled maintenance windows
  • Historical incident reports
  • Component-specific status (APM, Infrastructure, Browser, Synthetics, Alerts, NRDB, UI)

Pro tip: Subscribe to status updates via email, webhook, or RSS feed on the status page. You can filter by specific products and regions to receive only relevant notifications.

3. Check Your New Relic One UI

If the New Relic One platform at one.newrelic.com is experiencing issues, you'll notice:

  • Login failures or authentication timeouts
  • Dashboard widgets stuck loading
  • NRQL queries timing out or returning errors
  • Entity explorer not populating
  • Alert policy pages failing to load
  • APM transaction traces unavailable

UI responsiveness is often the first indicator of backend database or API gateway issues.

4. Test API Endpoints Directly

For developers, making test API calls can quickly confirm connectivity and performance:

GraphQL API (NerdGraph) health check:

curl https://api.newrelic.com/graphql \
  -H 'Content-Type: application/json' \
  -H 'API-Key: YOUR_USER_KEY' \
  -d '{"query": "{ actor { user { name email } } }"}'

REST API health check:

curl -X GET 'https://api.newrelic.com/v2/applications.json' \
  -H 'Api-Key: YOUR_REST_API_KEY'

NRQL query via API:

curl -X GET "https://insights-api.newrelic.com/v1/accounts/YOUR_ACCOUNT_ID/query?nrql=SELECT%20count(*)%20FROM%20Transaction%20SINCE%201%20hour%20ago" \
  -H "Accept: application/json" \
  -H "X-Query-Key: YOUR_QUERY_KEY"

Look for HTTP response codes outside the 2xx range, timeout errors (>30s), or error responses indicating service degradation.

5. Use New Relic Diagnostics CLI

New Relic provides a diagnostic tool that can help identify connectivity and configuration issues:

# Download and run New Relic Diagnostics
curl -O https://download.newrelic.com/nrdiag/nrdiag_latest.zip
unzip nrdiag_latest.zip
./nrdiag -t GUID -attach

This tool performs comprehensive checks including:

  • Network connectivity to New Relic collectors
  • Agent configuration validation
  • Proxy and firewall checks
  • SSL/TLS certificate validation
  • Local log file analysis

Common New Relic Issues and How to Identify Them

Agent Connectivity Failures

Symptoms:

  • Agents showing "disconnected" in Entity Explorer
  • No new data appearing in APM, Infrastructure, or Browser
  • Agent logs showing connection timeouts or 503 errors
  • Multiple applications across different hosts failing simultaneously

What it means: When agent connectivity fails across multiple hosts or regions, it typically indicates issues with New Relic's collector endpoints rather than your infrastructure. Single-agent failures are usually configuration or network issues on your end.

Diagnostic check:

import requests
import time

def check_agent_connectivity():
    """Test connectivity to New Relic collector endpoints"""
    collectors = [
        "https://collector.newrelic.com/status/mongrel",
        "https://rpm.newrelic.com/status/mongrel",
        "https://gov-collector.newrelic.com/status/mongrel"  # FedRAMP
    ]
    
    for collector in collectors:
        try:
            start = time.time()
            response = requests.get(collector, timeout=10)
            latency = (time.time() - start) * 1000
            
            if response.status_code == 200:
                print(f"✓ {collector}: OK ({latency:.0f}ms)")
            else:
                print(f"✗ {collector}: HTTP {response.status_code}")
        except requests.exceptions.Timeout:
            print(f"✗ {collector}: Timeout (>10s)")
        except requests.exceptions.RequestException as e:
            print(f"✗ {collector}: {str(e)}")

check_agent_connectivity()

Data Ingestion Lag

Indicators:

  • Metrics delayed by 5+ minutes (normal is 10-60 seconds)
  • "Data may be incomplete" warnings in dashboards
  • Recent time windows showing significantly fewer data points
  • Alert conditions not triggering despite threshold breaches
  • APM transaction traces missing for recent requests

Impact: Data ingestion lag means you're making decisions based on stale information. A 10-minute lag during an incident can cost precious troubleshooting time.

Detection query:

-- Check data freshness across different data types
SELECT 
  latest(timestamp) as lastSeen,
  (now() - latest(timestamp)) / 1000 as lagSeconds
FROM Transaction 
SINCE 5 minutes ago

-- Compare with other event types
SELECT 
  latest(timestamp) as lastMetric
FROM Metric 
WHERE metricName = 'apm.service.transaction.duration'
SINCE 5 minutes ago

If lagSeconds exceeds 300 (5 minutes) consistently, ingestion is degraded.

NRQL Query Timeouts

Common error patterns:

  • Queries that normally return in <2s timing out after 30-60s
  • "Query timeout" errors in dashboards
  • GraphQL queries returning 500 errors
  • NRDB (New Relic Database) performance degradation

When NRDB is impacted:

  • All queries slow down, not just complex ones
  • Simple SELECT count(*) FROM Transaction queries fail
  • Historical data queries (SINCE 30 days ago) especially affected

Diagnostic NRQL:

-- Test query performance with progressively larger time windows
SELECT count(*) 
FROM Transaction 
SINCE 1 hour ago
-- If this works but "SINCE 1 day ago" times out, NRDB is struggling

-- Check for query performance issues
SELECT percentile(duration, 50, 95, 99) 
FROM NrdbQuery 
WHERE query LIKE '%Transaction%' 
SINCE 1 hour ago
FACET query

Alert Condition Failures

Critical symptoms:

  • Alerts not firing despite threshold breaches visible in charts
  • Alert violations showing in UI but no notifications sent
  • Webhook and email integrations failing
  • Incident timelines missing expected violations
  • PagerDuty/Slack notifications not arriving

When this happens during a real incident: You lose your primary detection mechanism. Your production issues go undetected until customers report them.

Validation approach:

import requests

def test_alert_evaluation():
    """Trigger a test alert to verify alert pipeline"""
    API_KEY = "YOUR_USER_KEY"
    ACCOUNT_ID = "YOUR_ACCOUNT_ID"
    
    # Send a custom metric that should trigger test alert
    payload = [{
        "eventType": "TestAlertMetric",
        "value": 1000,  # Above threshold
        "timestamp": int(time.time())
    }]
    
    response = requests.post(
        f"https://insights-collector.newrelic.com/v1/accounts/{ACCOUNT_ID}/events",
        headers={
            "Content-Type": "application/json",
            "Api-Key": API_KEY
        },
        json=payload
    )
    
    if response.status_code == 200:
        print("✓ Event ingestion working")
        print("Check if alert fires within 3-5 minutes")
        print("If data arrives but alert doesn't fire, alert pipeline is down")
    else:
        print(f"✗ Event ingestion failed: {response.status_code}")

test_alert_evaluation()

Synthetics Monitor Problems

Failure patterns:

  • All monitors showing failures simultaneously across locations
  • Monitors stuck in "pending" state
  • Monitor results not appearing in UI
  • Scripted browser monitors timing out
  • API monitors returning connection errors

Distinguishing between target and New Relic issues:

  • If monitors for DIFFERENT targets all fail → New Relic Synthetics issue
  • If monitors for SAME target from multiple locations fail → Your target is down
  • If public monitors (google.com, etc.) succeed but yours fail → Your target issue

Diagnostic script:

// Synthetics scripted browser test to verify Synthetics runtime
$browser.get("https://one.newrelic.com/");

$browser.wait($driver.until.elementLocated($driver.By.css("body")), 5000)
  .then(function() {
    console.log("✓ Synthetics can reach external sites");
    return $browser.findElement($driver.By.css("body")).getAttribute("innerHTML");
  })
  .then(function(html) {
    if (html.length > 100) {
      console.log("✓ Synthetics browser runtime healthy");
    } else {
      console.log("✗ Response too short, possible issue");
    }
  });

Browser Agent Issues

Symptoms:

  • JavaScript errors spike across all applications
  • Browser agent script (js-agent.newrelic.com) failing to load
  • PageView events not appearing in Browser data
  • Session traces unavailable
  • Core Web Vitals metrics missing

CDN vs data collection distinction:

  • Agent script fails to load → CDN issue
  • Agent loads but no data in UI → Data collection pipeline issue

Detection snippet:

// Add to your application to detect Browser agent health
window.addEventListener('load', function() {
  setTimeout(function() {
    if (typeof newrelic === 'undefined') {
      console.error('New Relic Browser agent failed to load');
      // Report to backup monitoring
      fetch('/api/monitoring/alert', {
        method: 'POST',
        body: JSON.stringify({
          severity: 'high',
          message: 'New Relic Browser agent unavailable'
        })
      });
    } else {
      console.log('✓ New Relic Browser agent loaded');
    }
  }, 3000);
});

The Real Impact When New Relic Goes Down

Observability Blind Spots

When New Relic is unavailable, you lose visibility into:

  • Application performance: No APM data means you can't see transaction response times, error rates, or throughput
  • Infrastructure health: Missing CPU, memory, disk, and network metrics
  • Business metrics: Custom events and metrics stop flowing
  • User experience: Browser monitoring and real user data unavailable
  • Synthetic monitoring: Proactive checks stop running

The compounding effect: If a production incident occurs WHILE New Relic is down, you're troubleshooting blind. You can't:

  • Identify which service is causing the issue
  • See error traces and stack traces
  • Analyze database query performance
  • Understand user impact geography
  • Correlate infrastructure metrics with application behavior

Increased Mean Time to Resolution (MTTR)

Without observability, MTTR skyrockets:

  • Normal MTTR with full observability: 15-45 minutes
  • MTTR without observability tools: 2-6 hours or more

Why the dramatic increase:

  1. You must manually SSH into servers to check logs
  2. No centralized error aggregation or filtering
  3. No transaction traces to pinpoint slow components
  4. Can't correlate issues across services
  5. Must rely on customer reports instead of proactive detection

Cost calculation: If your average incident costs $10,000/hour in lost revenue and team time, losing New Relic during a critical incident can add $20,000-$50,000 in additional costs from extended downtime.

Capacity Planning Gaps

New Relic outages create blind spots in capacity planning:

  • Missing trend data: Can't analyze growth patterns during outage window
  • Incomplete historical analysis: Gaps in 30/60/90-day reports
  • Auto-scaling failures: If infrastructure decisions rely on New Relic metrics
  • Inaccurate forecasting: Models trained on incomplete data

For businesses planning Black Friday, product launches, or other high-traffic events, even a 2-hour gap in historical data can impact capacity decisions worth millions.

SLA Reporting Failures

Many businesses rely on New Relic data for SLA reporting:

  • Customer-facing SLAs: Can't prove 99.9% uptime if monitoring was down
  • Internal SLIs/SLOs: Service Level Indicators incomplete
  • Compliance requirements: Audit trails with gaps
  • Financial implications: SLA breach penalties if you can't prove uptime

The double bind: If your service AND monitoring are both down, you can't definitively prove duration of the outage, potentially triggering maximum SLA credits.

Alert Fatigue and Missed Incidents

When New Relic comes back online after an outage:

  • Alert storm: Backlog of alerts fire simultaneously
  • False positives: Transient issues during recovery trigger alerts
  • Missed incidents: Real issues buried in noise
  • Team burnout: Engineers overwhelmed by notification flood

This can lead to future alerts being ignored or deprioritized, reducing the effectiveness of your monitoring strategy for weeks after the incident.

What to Do When New Relic Goes Down

1. Implement Multi-Provider Observability

Never rely on a single observability platform. Implement defense in depth:

# Multi-provider metrics router with automatic failover
import time
import requests
from enum import Enum

class MetricsProvider(Enum):
    NEW_RELIC = "newrelic"
    DATADOG = "datadog"
    GRAFANA_CLOUD = "grafana"

class ObservabilityRouter:
    def __init__(self):
        self.providers = {
            MetricsProvider.NEW_RELIC: {
                "url": "https://insights-collector.newrelic.com/v1/accounts/{account}/events",
                "api_key": "YOUR_NR_KEY",
                "healthy": True,
                "last_check": 0
            },
            MetricsProvider.DATADOG: {
                "url": "https://api.datadoghq.com/api/v1/series",
                "api_key": "YOUR_DD_KEY",
                "healthy": True,
                "last_check": 0
            }
        }
        self.health_check_interval = 60  # seconds
    
    def send_metric(self, metric_name, value, tags=None):
        """Send metric to all healthy providers"""
        results = []
        
        for provider, config in self.providers.items():
            if self._is_healthy(provider):
                try:
                    self._send_to_provider(provider, metric_name, value, tags)
                    results.append((provider, True))
                except Exception as e:
                    print(f"Failed to send to {provider.value}: {e}")
                    self._mark_unhealthy(provider)
                    results.append((provider, False))
        
        # At least one provider must succeed
        if not any(success for _, success in results):
            raise Exception("All observability providers failed")
        
        return results
    
    def _is_healthy(self, provider):
        """Check if provider is healthy (with caching)"""
        config = self.providers[provider]
        now = time.time()
        
        # Re-check health every 60 seconds
        if now - config["last_check"] > self.health_check_interval:
            config["healthy"] = self._health_check(provider)
            config["last_check"] = now
        
        return config["healthy"]
    
    def _health_check(self, provider):
        """Perform actual health check against provider"""
        config = self.providers[provider]
        
        try:
            if provider == MetricsProvider.NEW_RELIC:
                # Test New Relic ingestion endpoint
                response = requests.post(
                    config["url"],
                    headers={"Api-Key": config["api_key"]},
                    json=[{"eventType": "HealthCheck", "value": 1}],
                    timeout=5
                )
                return response.status_code == 200
            
            elif provider == MetricsProvider.DATADOG:
                # Test Datadog API
                response = requests.get(
                    "https://api.datadoghq.com/api/v1/validate",
                    headers={"DD-API-KEY": config["api_key"]},
                    timeout=5
                )
                return response.status_code == 200
        
        except Exception as e:
            print(f"Health check failed for {provider.value}: {e}")
            return False
    
    def _mark_unhealthy(self, provider):
        """Mark provider as unhealthy"""
        self.providers[provider]["healthy"] = False
        self.providers[provider]["last_check"] = time.time()
    
    def _send_to_provider(self, provider, metric_name, value, tags):
        """Provider-specific metric sending logic"""
        config = self.providers[provider]
        
        if provider == MetricsProvider.NEW_RELIC:
            payload = [{
                "eventType": "CustomMetric",
                "metricName": metric_name,
                "value": value,
                "timestamp": int(time.time()),
                **(tags or {})
            }]
            
            response = requests.post(
                config["url"],
                headers={"Api-Key": config["api_key"]},
                json=payload,
                timeout=10
            )
            response.raise_for_status()
        
        elif provider == MetricsProvider.DATADOG:
            payload = {
                "series": [{
                    "metric": metric_name,
                    "points": [[int(time.time()), value]],
                    "type": "gauge",
                    "tags": [f"{k}:{v}" for k, v in (tags or {}).items()]
                }]
            }
            
            response = requests.post(
                config["url"],
                headers={"DD-API-KEY": config["api_key"]},
                json=payload,
                timeout=10
            )
            response.raise_for_status()

# Usage
router = ObservabilityRouter()

# Automatically routes to all healthy providers
router.send_metric("api.response_time", 145, tags={
    "endpoint": "/api/users",
    "status": "200"
})

Recommended backup observability stack:

  • Metrics: Datadog, Grafana Cloud, or Prometheus
  • Logs: Splunk, Elasticsearch, or Loki
  • Errors: Sentry for application errors
  • Uptime: PagerDuty for synthetic monitoring and alerting

2. Implement Local Metrics Collection

Don't send ALL metrics to the cloud. Maintain local collection for critical data:

# Local metrics collector with time-series database
import sqlite3
import json
from datetime import datetime

class LocalMetricsStore:
    """SQLite-based local metrics storage for New Relic outages"""
    
    def __init__(self, db_path="metrics.db"):
        self.conn = sqlite3.connect(db_path, check_same_thread=False)
        self._create_tables()
    
    def _create_tables(self):
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS metrics (
                id INTEGER PRIMARY KEY AUTOINCREMENT,
                timestamp INTEGER NOT NULL,
                metric_name TEXT NOT NULL,
                value REAL NOT NULL,
                tags TEXT,
                synced_to_newrelic INTEGER DEFAULT 0,
                INDEX idx_timestamp (timestamp),
                INDEX idx_metric (metric_name),
                INDEX idx_synced (synced_to_newrelic)
            )
        """)
        self.conn.commit()
    
    def record(self, metric_name, value, tags=None):
        """Record metric locally"""
        self.conn.execute(
            "INSERT INTO metrics (timestamp, metric_name, value, tags) VALUES (?, ?, ?, ?)",
            (int(time.time()), metric_name, value, json.dumps(tags or {}))
        )
        self.conn.commit()
    
    def sync_to_newrelic(self, api_key, account_id):
        """Backfill unsynced metrics to New Relic once it's back online"""
        cursor = self.conn.execute("""
            SELECT id, timestamp, metric_name, value, tags 
            FROM metrics 
            WHERE synced_to_newrelic = 0
            ORDER BY timestamp ASC
            LIMIT 1000
        """)
        
        unsynced = cursor.fetchall()
        
        if not unsynced:
            print("All metrics synced!")
            return 0
        
        # Batch send to New Relic
        events = []
        for row_id, ts, name, value, tags_json in unsynced:
            events.append({
                "eventType": "BackfilledMetric",
                "metricName": name,
                "value": value,
                "timestamp": ts,
                **json.loads(tags_json)
            })
        
        try:
            response = requests.post(
                f"https://insights-collector.newrelic.com/v1/accounts/{account_id}/events",
                headers={"Api-Key": api_key},
                json=events,
                timeout=30
            )
            
            if response.status_code == 200:
                # Mark as synced
                ids = [row[0] for row in unsynced]
                placeholders = ','.join('?' * len(ids))
                self.conn.execute(
                    f"UPDATE metrics SET synced_to_newrelic = 1 WHERE id IN ({placeholders})",
                    ids
                )
                self.conn.commit()
                print(f"✓ Synced {len(unsynced)} metrics to New Relic")
                return len(unsynced)
            else:
                print(f"✗ Sync failed: HTTP {response.status_code}")
                return 0
        
        except Exception as e:
            print(f"✗ Sync error: {e}")
            return 0
    
    def query(self, metric_name, start_time, end_time):
        """Query local metrics (for emergency dashboards)"""
        cursor = self.conn.execute("""
            SELECT timestamp, value, tags 
            FROM metrics 
            WHERE metric_name = ? 
            AND timestamp BETWEEN ? AND ?
            ORDER BY timestamp ASC
        """, (metric_name, start_time, end_time))
        
        return cursor.fetchall()

# Usage during New Relic outage
local_store = LocalMetricsStore()

# Record metrics locally
local_store.record("api.requests", 150, {"endpoint": "/api/users", "status": "200"})

# Once New Relic is back online, backfill
local_store.sync_to_newrelic("YOUR_NR_KEY", "YOUR_ACCOUNT_ID")

3. Fallback Alert Mechanisms

Don't rely solely on New Relic alerts. Implement multi-layer alerting:

#!/bin/bash
# emergency-monitor.sh - Runs when New Relic is down

while true; do
  # Check critical endpoint
  response_time=$(curl -o /dev/null -s -w '%{time_total}\n' https://api.yourapp.com/health)
  response_code=$(curl -o /dev/null -s -w '%{http_code}\n' https://api.yourapp.com/health)
  
  # Check server resources
  cpu_usage=$(top -bn1 | grep "Cpu(s)" | awk '{print $2}' | cut -d'%' -f1)
  mem_usage=$(free | grep Mem | awk '{print ($3/$2) * 100.0}')
  
  # Alert if thresholds breached
  if (( $(echo "$response_time > 2.0" | bc -l) )); then
    curl -X POST "https://api.pagerduty.com/incidents" \
      -H "Authorization: Token token=YOUR_PD_TOKEN" \
      -H "Content-Type: application/json" \
      -d '{
        "incident": {
          "type": "incident",
          "title": "API response time > 2s (New Relic down, using fallback)",
          "service": {"id": "YOUR_SERVICE_ID", "type": "service_reference"},
          "urgency": "high",
          "body": {"type": "incident_body", "details": "Response time: '"$response_time"'s"}
        }
      }'
  fi
  
  if [ "$response_code" != "200" ]; then
    # Send to backup alerting
    echo "CRITICAL: API health check returned $response_code" | \
      mail -s "Emergency Alert" oncall@yourcompany.com
  fi
  
  sleep 60
done

4. Diagnostic NRQL Queries for New Relic Health

When you suspect New Relic issues, run these diagnostic queries:

-- 1. Check data freshness across event types
SELECT 
  count(*) as events,
  latest(timestamp) as mostRecent,
  (now() - latest(timestamp)) / 1000 as lagSeconds
FROM Transaction 
SINCE 10 minutes ago

-- If lagSeconds > 300, data ingestion is lagging

-- 2. Identify gaps in metric reporting
SELECT 
  histogram(timestamp, 60000, 10) 
FROM Metric 
WHERE metricName = 'apm.service.transaction.duration'
SINCE 30 minutes ago
-- Look for missing buckets indicating ingestion gaps

-- 3. Check agent connectivity over time
SELECT 
  uniqueCount(entityGuid) as connectedAgents
FROM SystemSample 
SINCE 1 hour ago 
TIMESERIES 1 minute
-- Sudden drops indicate agent connectivity issues

-- 4. Verify alert condition evaluation
SELECT 
  count(*) as evaluations,
  filter(count(*), WHERE result = 'violation') as violations
FROM NrAiIncident 
WHERE conditionName = 'YOUR_CRITICAL_ALERT'
SINCE 1 hour ago 
TIMESERIES 5 minutes
-- If evaluations = 0, alert pipeline is not running

-- 5. Synthetics monitor health check
SELECT 
  percentage(count(*), WHERE result = 'SUCCESS') as successRate,
  average(duration) as avgDuration
FROM SyntheticCheck 
SINCE 30 minutes ago 
FACET monitorName
-- If ALL monitors show low success rate, Synthetics platform issue

5. Create an Emergency Runbook

Document your New Relic outage response procedure:

Immediate actions (0-5 minutes):

  1. Verify outage via status.newrelic.com and API Status Check
  2. Enable fallback monitoring scripts
  3. Switch to backup observability platform dashboards
  4. Notify engineering team via Slack/PagerDuty
  5. Start incident timeline documentation

Short-term mitigation (5-30 minutes):

  1. Increase log verbosity on critical services
  2. Enable local metrics collection
  3. Set up emergency health check endpoints
  4. Brief on-call engineers about reduced observability
  5. Defer non-critical deployments until New Relic returns

Recovery actions (after restoration):

  1. Backfill local metrics to New Relic
  2. Review alert conditions for missed violations
  3. Check for data gaps in dashboards
  4. Validate agent connectivity across all hosts
  5. Document incident and improve runbook

Frequently Asked Questions

How often does New Relic go down?

New Relic maintains strong uptime, typically 99.9%+ availability. Major platform-wide outages are rare (2-4 times per year), though regional or component-specific issues (affecting only APM, or only EU region, etc.) may occur more frequently. Most businesses experience zero downtime from New Relic in a typical quarter.

What's the difference between New Relic status page and API Status Check?

The official New Relic status page (status.newrelic.com) is manually updated by New Relic's team during incidents, which can lag behind actual issues by 5-15 minutes. API Status Check performs automated health checks every 60 seconds against live endpoints (GraphQL API, REST API, data ingestion), often detecting issues before they're officially reported. Use both for comprehensive awareness.

Can I get SLA credits for New Relic outages?

New Relic offers SLA credits for eligible customers (typically Pro and Enterprise tiers) when uptime falls below 99.95% in a calendar month. Credits are calculated as a percentage of monthly fees based on achieved uptime. Review your specific contract or contact New Relic account team for your tier's SLA terms. Standard tier typically does not include SLA guarantees.

Should I rely on New Relic for critical production alerts?

While New Relic Alerts is highly reliable, best practice for mission-critical alerts is defense in depth: use New Relic as primary alerting but implement backup alerting via PagerDuty, Datadog, or custom scripts that directly monitor your services. This ensures alert redundancy if any single platform experiences issues.

How do I prevent duplicate metrics during New Relic outages?

When using local metrics collection with later backfill, mark backfilled events with a distinct eventType (e.g., "BackfilledMetric" instead of "Metric") and include a backfilled: true attribute. This allows you to filter them in queries and avoid double-counting in dashboards that aggregate both real-time and historical data.

What causes New Relic agent disconnections?

Agent disconnections can result from: (1) Network issues between your infrastructure and New Relic collectors, (2) Proxy or firewall blocking collector endpoints, (3) New Relic collector outages, (4) Agent configuration errors, (5) Certificate validation failures. Use New Relic Diagnostics CLI to distinguish between local vs platform issues.

How long does New Relic retain data during outages?

New Relic has built-in buffering and retry logic. Agents typically buffer data locally for 1-2 hours during collector outages. If the outage exceeds this window, data may be lost. Event data has different retention (Real-time: seconds, Standard: 1 minute, Extended: up to 1 hour). Plan for local persistence if data is business-critical.

Can I test if New Relic is working without affecting production data?

Yes. Create a test application in New Relic and send synthetic events:

curl -X POST "https://insights-collector.newrelic.com/v1/accounts/YOUR_ACCOUNT/events" \
  -H "Api-Key: YOUR_INSERT_KEY" \
  -H "Content-Type: application/json" \
  -d '[{"eventType":"HealthCheckTest","value":1}]'

Then query for these events in NRDB. This tests the full pipeline (ingestion, storage, query) without impacting production data.

Why do my NRQL queries work in EU but not US (or vice versa)?

New Relic operates separate data centers for US and EU regions. If queries work in one region but not another, it indicates regional infrastructure issues. Check status.newrelic.com for region-specific incident reports. Your account data resides in ONE region based on account creation; cross-region queries aren't supported.

Should I increase agent data collection during New Relic outages?

No—counter-intuitively, reduce data collection during outages. Agents buffer data locally, and excessive buffering can cause memory issues. Instead, maintain minimal critical metrics locally and reduce transaction trace collection, browser agent sampling, and custom event volume until New Relic service is restored.

Stay Ahead of New Relic Outages

Don't let observability blind spots derail your incident response. Subscribe to real-time New Relic alerts and get notified instantly when issues are detected—before your monitoring goes dark.

API Status Check monitors New Relic 24/7 with:

  • 60-second health checks across API, ingestion, and query endpoints
  • Instant alerts via email, Slack, Discord, or webhook
  • Historical uptime tracking and incident reports
  • Multi-API monitoring for your entire observability stack

Start monitoring New Relic now →


Last updated: February 4, 2026. New Relic status information is provided in real-time based on active monitoring. For official incident reports, always refer to status.newrelic.com.

Related guides:

Monitor Your APIs

Check the real-time status of 100+ popular APIs used by developers.

View API Status →