Grafana Cloud Outage History
50 incidents reported. Data sourced from the official Grafana Cloud status page.
50
Total Incidents
22
Major/Critical
22
Minor
49
Resolved
June 2026
IRM Degraded Performance
minorJun 8, 10:45 AM→Jun 8, 08:30 PMresolved
Jun 8, 08:30 PM
resolved — This incident has been resolved.
Jun 8, 05:11 PM
monitoring — We've released a fix to the IRM app that should restore service for affected customers with issues related to labels. Thanks for your patience while investigating. We're continuing to monitor as we co...
Jun 8, 03:21 PM
identified — We are continuing to work on a fix for this. To further clarify, this issue is not about accessing IRM or alert ingestion/notification/delivery, but rather with handling labels.
+5 more updates
Brief Rule Evaluation Failures in prod-eu-west-3
majorJun 7, 01:39 AMmonitoring
Jun 8, 09:36 PM
monitoring — The incident has been mitigated, and services are operating normally. We continue to monitor the service to ensure full stability.
Jun 7, 11:00 AM
monitoring — The incident has been mitigated, and services are operating normally. We are currently monitor the service to ensure full stability.
Jun 7, 06:00 AM
investigating — We’re making ongoing progress on the investigation alongside our upstream provider.
+3 more updates
Permissions Issues with IRM
criticalJun 5, 04:11 PM→Jun 5, 09:35 PMresolved
Jun 5, 09:35 PM
resolved — This incident has been resolved.
Jun 5, 08:10 PM
monitoring — Continuing to monitor progress. Most customers affected should have all services restored, with a few remaining customers receiving updates as the rollout finishes out. Thanks again for your patience.
Jun 5, 07:34 PM
monitoring — A fix has been released to prod and rolling out across the fleet for IRM, restoring access to affected customers. Thanks for your patience through this work. We're continuing to monitor to confirm we'...
+5 more updates
Silences not Working as Expected
majorJun 5, 02:02 PM→Jun 5, 03:01 PMresolved
Jun 5, 03:01 PM
resolved — This incident has been resolved.
Jun 5, 02:02 PM
identified — We have identified an issue causing Silences to not work as expected in the Cloud (Mimir) Alertmanager. Grafana Alertmanager is working ok, this is only affecting Data source-managed alerts.
Grafana Assistant Skills Page Blank
majorJun 4, 04:44 PM→Jun 4, 06:28 PMresolved
Jun 4, 06:28 PM
resolved — This incident has been resolved.
Jun 4, 05:00 PM
identified — The issue has been identified, and we are working on a fix.
Jun 4, 04:56 PM
investigating — We are continuing to investigate this issue.
+1 more updates
K6 Test Runs Degraded Performance
minorJun 3, 08:40 PM→Jun 4, 05:47 AMresolved
Jun 4, 05:47 AM
resolved — This incident has been resolved.
Jun 3, 09:23 PM
monitoring — We have applied a fix, and are monitoring the results.
Jun 3, 08:40 PM
investigating — We are currently investigating an issue causing k6 test runs to take longer than expected to complete, or to time out within Grafana Cloud.
Synthetic Scripted/Browser checks failure
majorJun 3, 11:38 AM→Jun 3, 06:22 PMresolved
Jun 3, 06:22 PM
resolved — This incident has been resolved.
Jun 3, 05:07 PM
identified — We are in the process of deploying a fix for this issue.
Jun 3, 11:38 AM
investigating — We’re currently investigating an issue affecting Synthetic Monitoring where updates for Scripted/Browser checks might fail. Our team is actively working to identify the cause. Thank you for your patie...
tempo prod-25 write-path-down
minorJun 2, 11:15 PM→Jun 3, 07:46 AMresolved
Jun 3, 07:46 AM
resolved — This incident has been resolved.
Jun 2, 11:15 PM
identified — Between 21:20 and 22:40 UTC, writes to tempo-prod-25 failed due to an outage. tempo-prod-24 was also affected during an overlapping window from 22:32 to 22:40 UTC."
Alert manager unavailable in prod-us-central-0
minorJun 1, 07:53 PM→Jun 1, 08:26 PMresolved
Jun 1, 08:26 PM
resolved — This incident has been resolved.
Jun 1, 08:02 PM
monitoring — A fix has been implemented and we are monitoring the results.
Jun 1, 07:53 PM
identified — Starting at 18:30 UTC, we noticed alert manager unavailability limited to prod-us-central-0 which affects grafana-managed and datasource-managed alerting, causing disruption to updating alertmanager c...
May 2026
Grafana Loki Log Query Issues
majorMay 29, 09:03 AM→May 29, 10:59 AMresolved
May 29, 10:59 AM
resolved — This incident has been resolved.
May 29, 10:28 AM
monitoring — We have identified the cause of this incident and a fix has been applied. Normal functions are returning. We are currently monitoring the recovery process.
May 29, 09:03 AM
investigating — We’re currently investigating an issue affecting Loki queries in Grafana. We have had reports from customers showing the logs are not loading or showing missing logs. Our team is actively working to i...
Prometheus Datasource Errors/Outage in prod-us-east-0
majorMay 27, 08:22 PM→May 27, 10:59 PMresolved
May 27, 10:59 PM
resolved — This incident has been resolved. Thank you for your patience.
May 27, 09:57 PM
investigating — We are seeing recovery across affected Prometheus datasources, and error rates have significantly improved.
The service is recovering without any required customer action, and our team continues to m...
May 27, 09:36 PM
investigating — We continue to investigate an issue affecting Prometheus datasources causing intermittent timeouts and unexpected errors, primarily impacting alert rule evaluations.
Our team is actively working to ...
+1 more updates
Grafana K6 metrics processing and test runs degradation
minorMay 18, 08:24 AM→May 18, 03:42 PMresolved
May 18, 03:42 PM
resolved — This incident has been resolved.
May 18, 02:30 PM
monitoring — We've stabilized the system and test runs no longer result in timeout. There is a small delay (a few minutes) in processing metrics at the end of the test run, but most users shouldn't be too negative...
May 18, 10:27 AM
investigating — We have identified that test runs are getting timed out as a result of the issue
This issue first occurred on May 05/15/2026 at 8:00PM UTC.
+1 more updates
Intermittent Errors and High latency Writing to Cloud Metrics, Cloud Logs and Cloud Traces
minorMay 13, 08:50 AM→May 14, 07:06 AMresolved
May 14, 07:06 AM
resolved — We continue to observe an extended period of recovery and we're marking the incident as resolved at this point in time.
May 13, 09:10 PM
monitoring — We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.
May 13, 03:41 PM
monitoring — We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.
+4 more updates
"Failed to Load Dashboard" Errors
majorMay 11, 09:38 PM→May 12, 04:35 PMresolved
May 12, 04:35 PM
resolved — This incident has been resolved. Thank you for your patience.
May 12, 02:13 PM
identified — The fix is currently being rolled out to all impacted environments.
May 12, 11:11 AM
identified — Our teams continue working on a fix for this issue. We do not have additional information to share at this time, but we will continue to provide updates as progress is made.
+2 more updates
SSL/TLS Connectivity Issues
majorMay 11, 08:49 PM→May 11, 10:40 PMresolved
May 11, 10:40 PM
resolved — This incident has been resolved. Thank you for your patience.
May 11, 08:49 PM
investigating — We are currently investigating reports of service disruption affecting a subset of customers.
Customers may experience intermittent connectivity issues, degraded performance, or SSL/TLS certificate v...
Cloud Metrics -High Write Latency and Errors in prod-us-central-7
minorMay 8, 09:16 PM→May 8, 10:30 PMresolved
May 8, 10:30 PM
resolved — We have continued to observe stability.
This incident is now being considered as resolved. Thank you for your patience.
May 8, 09:16 PM
monitoring — From approximately 20:40-21:00 UTc, we experienced an issue affecting Grafana Cloud Metrics in prod-us-central-7. Affected users may have experienced high latency and/or errors during ingestion and ru...
Metrics read errors in prod-ap-south-1 region
criticalMay 7, 07:18 AM→May 7, 07:56 AMresolved
May 7, 07:56 AM
resolved — At this time, we have confirmed that the query errors have gone and we are considering this issue resolved.
May 7, 07:53 AM
monitoring — Engineering has released a fix and as of 07:50 UTC, customers should no longer experience errors when querying metrics. We will continue to monitor for recurrence and provide updates accordingly.
May 7, 07:18 AM
investigating — From approximately 06:24 UTC, we were alerted to an issue with read errors in mimir-prod-43. Users with instances hosted in the prod-ap-south-1 region experiencing this issue may encounter an error me...
Hardware failure on CSP within prod-us-west-0
majorMay 6, 05:30 AM→May 6, 05:30 AMresolved
May 8, 11:04 AM
resolved — We observed an underlying hardware failure on our CSP which triggered an automatic live VM migration. The situation caused a degradation in write performance for Grafana Cloud Metrics on prod-us-west-...
Elevated Error Rate of Browser Checks in PoP Oregon
minorMay 5, 04:11 PM→May 5, 08:13 PMresolved
May 5, 08:13 PM
resolved — This incident has been resolved. Thank you for your patience.
May 5, 07:44 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
May 5, 06:13 PM
identified — We’ve identified the cause of the issue impacting browser checks. Our team is currently implementing a fix.
+1 more updates
k6 Partial Outage
majorMay 4, 10:58 PM→May 5, 02:09 AMresolved
May 5, 02:09 AM
resolved — This incident has been resolved. Thank you for your patience.
May 5, 12:04 AM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
May 4, 11:23 PM
investigating — After further investigation, this issue may also be affecting Synthetic Monitoring.
We continue to identify the cause and will update as soon as we have more information.
+1 more updates
Ingestion Errors for AWS Cloud Provider Observability Metric Streams in prod-us-central-7
majorMay 1, 09:14 AM→May 1, 10:27 AMresolved
May 1, 10:27 AM
resolved — This incident has been resolved.
May 1, 09:43 AM
monitoring — A fix has been implemented and we are monitoring the results.
May 1, 09:42 AM
investigating — We are continuing to investigate this issue.
+1 more updates
April 2026
Gateway Slowness Detected in Prod (US-East-1)
minorApr 28, 09:20 AM→Apr 30, 03:11 PMresolved
Apr 30, 03:11 PM
resolved — After further review, this was a false alarm and should not have affected any users.
This incident has been resolved. Thank you for your patience.
Apr 28, 09:20 AM
investigating — Successful requests have dropped, users may not be able to access their instances.. The issue is under investigation.
Investigating Issues Saving SQL Datasource Credentials
minorApr 28, 06:46 PM→Apr 29, 01:37 PMresolved
Apr 29, 01:37 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 28, 06:59 PM
monitoring — We’ve identified the cause of the issue impacting SQL datasources. Our team is currently implementing a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to...
Apr 28, 06:46 PM
investigating — We are currently investigating reports of issues affecting SQL-based data sources where users are unable to save credentials.
This appears to impact a subset of customers and may be occurring across ...
Performance Testing – Degraded Service (Resolved)
noneApr 29, 12:00 PM→Apr 29, 12:00 PMresolved
Apr 29, 01:51 PM
resolved — We experienced degraded performance affecting Performance Testing from 13:10 UTC to 13:20 UTC. During this time, users may not have been able to start new test runs.
The issue has been resolved, and ...
Elevated write latency for AWS Metrics Streaming integration in us-east-3 region.
minorApr 29, 10:30 AM→Apr 29, 10:30 AMresolved
Apr 29, 12:57 PM
resolved — We were facing an incident with AWS Metrics Streaming integration in us-east-3 region manifesting in elevated ingestion latency. The incident started at around 10:45 UTC and was resolved at around 12:...
InfluxDB Datasource - Intermittent Failures
majorApr 27, 05:08 PM→Apr 27, 11:24 PMresolved
Apr 27, 11:24 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 27, 11:13 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
Apr 27, 06:01 PM
identified — We’ve identified the cause of the issue impacting the InfluxDB datasource. Our team is currently implementing a fix.
+1 more updates
Restrictions on Alerts & Reports for Grafana Cloud Free/Trial Users
minorApr 20, 09:12 PM→Apr 24, 03:04 PMresolved
Apr 24, 03:04 PM
resolved — Grafana Labs has taken steps to safeguard the Grafana Cloud platform against the distribution of unauthorized emails. We have implemented the following changes to new Grafana Cloud Free and Trial acco...
Apr 22, 03:03 PM
monitoring — Grafana Labs is implementing measures to safeguard the Grafana Cloud platform against ongoing unauthorized use while preserving the capabilities relied upon by our community. Effective immediately, we...
Apr 20, 10:07 PM
monitoring — We are continuing to monitor for any further issues.
+1 more updates
Cloudwatch Datasource Outage
majorApr 23, 02:26 PM→Apr 23, 08:01 PMresolved
Apr 23, 08:01 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 23, 02:39 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
Apr 23, 02:26 PM
investigating — We’re currently investigating an issue affecting Cloudwatch datasources. Our team is actively working to identify the cause. Thank you for your patience.
Elevated 429 Errors Impacting Metrics Querying Across Multiple Regions
criticalApr 20, 02:09 PM→Apr 20, 02:30 PMresolved
Apr 20, 02:30 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 20, 02:21 PM
investigating — The issue is now confirmed to be widespread, affecting Prometheus across all regions.
Customers may continue to experience elevated 429 (rate limit) errors, particularly when querying metrics, with f...
Apr 20, 02:09 PM
investigating — We are currently experiencing a major incident causing elevated 429 (rate limit) errors across multiple regions, primarily impacting metrics querying.
This is a high-priority issue, and our engineeri...
Query Caching - Degraded Performance
minorApr 17, 09:23 PM→Apr 17, 10:58 PMresolved
Apr 17, 10:58 PM
resolved — This incident has been resolved
Apr 17, 10:09 PM
monitoring — Currently prod-us-east-0 and prod-eu-west-3 have recovered, and we are continuing to monitor prod-us-central-0 which is in the process of recovery.
Apr 17, 09:23 PM
investigating — As of 20:52 UTC, we are currently investigating degraded Query Caching performance in multiple regions. For datasources where query caching is configured, some queries may take longer than usual.
Our...
Issues on Stack creation
minorApr 16, 12:52 PM→Apr 16, 02:02 PMresolved
Apr 16, 02:02 PM
resolved — This incident has been resolved.
Apr 16, 01:19 PM
monitoring — The issue is fixed and we are currently monitoring the service.
Apr 16, 12:52 PM
identified — Since today 16th at ~12:11UTC we are seeing issues on stack creation across all our regions. Customers will experience error message when attempting to create a stack.
Our engineering team has identif...
Degraded Ticket Visibility in Support System
minorApr 15, 04:07 PM→Apr 15, 04:25 PMresolved
Apr 15, 04:25 PM
resolved — This incident has been resolved and our ticketing system is fully operational. Thank you for your patience.
Apr 15, 04:07 PM
monitoring — We are currently experiencing an issue with our ticketing system provider that is affecting how tickets appear within our internal support views.
We are continuing to receive all new tickets successf...
K6 Sporadic DNS Issues
minorApr 14, 09:22 AM→Apr 15, 12:59 PMresolved
Apr 15, 12:59 PM
resolved — This incident is now resolved. We had intermediary issues with a flaky DNS server that caused random tests to not start properly. Since the DNS server was fixed, we haven't been seeing the issue anymo...
Apr 14, 02:29 PM
monitoring — Our engineering team has deployed a fix and we are currently monitoring the behaviour of the system until full resolution.
Apr 14, 02:29 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
+1 more updates
k6 Cloud Service Disruption
noneApr 14, 11:30 AM→Apr 14, 11:30 AMresolved
Apr 14, 01:44 PM
resolved — Between approximately 12:30 UTC and 13:15 UTC, k6 Cloud experienced a service disruption due to issues introduced in a recent API release. During this time, users were unable to access the k6 Cloud ap...
Loki write instability in prod-eu-west-2.loki-prod-012
noneApr 13, 11:30 AM→Apr 13, 11:30 AMresolved
Apr 14, 12:02 PM
resolved — There was a period of write instability yesterday. It was between ~1330 -1730 UTC yesterday. This was due to a scheduled maintenance.
Grafana Cloud Logs - Write degradation in us-east-3
majorApr 10, 11:53 PM→Apr 11, 12:36 AMresolved
Apr 11, 12:36 AM
resolved — This incident has been resolved.
Apr 11, 12:10 AM
monitoring — A fix has been implemented and we are monitoring the results.
Apr 10, 11:53 PM
investigating — We are seeing issues on the write path for Loki in cluster in us-east-3, and we are actively investigating this issue.
Tempo Write Outage
majorApr 10, 07:42 PM→Apr 10, 09:02 PMresolved
Apr 10, 09:02 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 10, 07:53 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within an hour.
Apr 10, 07:42 PM
investigating — We are currently investigating a write outage affecting prod-us-east-3. The issue began at 18:50 UTC. Users may experience errors, timeouts, or unavailability while we work to identify the cause and r...
K6 Browser Testing/Timeline Not Available
minorApr 9, 05:34 PM→Apr 9, 06:50 PMresolved
Apr 9, 06:50 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 9, 06:39 PM
identified — We’ve identified the cause of the issue impacting k6 browser testing/timeline. Our team is currently implementing a fix. We’ll provide another update in two hours or sooner if the situation changes.
Apr 9, 05:34 PM
investigating — We’re currently investigating an issue affecting browser testing.
Users running browser tests will not be able to see the browser timeline.
Our team is actively working to identify the cause and wi...
Stability Issues for Some Customers in the prod-gb-south-1 Region.
minorApr 8, 05:00 PM→Apr 8, 05:00 PMresolved
Apr 8, 05:00 PM
resolved — We had a stability issue for a subset of customers in the prod-gb-south-1 region. The impact was between UTC 15:20-16:30 which impacted roughly 30% of queries and rules evaluations. We've applied miti...
Unable to Edit Notification Policies
minorApr 7, 03:17 PM→Apr 7, 08:17 PMresolved
Apr 7, 08:17 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 7, 06:03 PM
identified — We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.
Apr 7, 04:52 PM
identified — We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.
+1 more updates
Notification Policies and Contact Points Missing in UI on the Slow Release Channel
minorApr 6, 02:48 PM→Apr 7, 12:26 PMresolved
Apr 7, 12:26 PM
resolved — This incident has been resolved.
Apr 6, 11:58 PM
monitoring — We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within 2 hours.
Apr 6, 09:04 PM
identified — We’ve identified the cause of the issue impacting the Notification Policy and Contact Point UI. Our team is currently implementing a fix.
We’ll provide another update when the fix is deployed and we...
+2 more updates
Partial K6 Test Run Outage
majorApr 3, 03:29 PM→Apr 3, 05:38 PMresolved
Apr 3, 05:38 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 3, 03:29 PM
investigating — We're experiencing an outage affecting test runs that use k6 extensions. The issue prevents users from executing these types of test runs both locally and in Grafana Cloud.
Test runs that do not use ...
Query degradation and possible rule evaluation failure on prod-eu-west-0.cortex-prod-01
minorApr 1, 09:56 AM→Apr 1, 09:13 PMresolved
Apr 1, 09:13 PM
resolved — This incident has been resolved.
Apr 1, 10:12 AM
monitoring — A fix has been implemented and we are monitoring the results.
Apr 1, 10:11 AM
investigating — We are continuing to investigate this issue.
+1 more updates
AWS integration Degraded Performance
minorApr 1, 08:17 PM→Apr 1, 09:03 PMresolved
Apr 1, 09:03 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 1, 08:17 PM
investigating — We are investigating a noticeable drop in active series for the AWS integration that began around 18:15 UTC.
This issue may cause scrapes to hit rate limits, which can result in individual data point...
March 2026
Prometheus writes in prod-eu-west-3 are degraded
criticalMar 25, 02:11 PM→Apr 23, 08:07 PMresolved
Apr 23, 08:07 PM
resolved — This incident has been resolved. Thank you for your patience.
Apr 20, 03:08 PM
monitoring — We are continuing to monitor for any further issues.
Apr 14, 08:11 PM
monitoring — We have deployed mitigation and seen improvement in write failures over the past week. We are still seeing intermittent spikes in latency and continue to monitor.
+7 more updates
k6 Cloud Degradation
noneMar 31, 02:59 PM→Mar 31, 02:59 PMresolved
Mar 31, 02:59 PM
resolved — From approximately 11:00 UTC - 15:00 UTC we had a degradation that caused test start errors for a large percentage of Cloud runs managed as scripts in the GCK6 app. This has since been resolved.
Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.
noneMar 31, 02:32 PM→Mar 31, 02:32 PMresolved
Mar 31, 02:32 PM
resolved — This is a retroactive status page linked to the following incident: https://status.grafana.com/incidents/38wwbz50ggrp
This retroactive status page is meant to clarify the time of impact. This issue f...
Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.
majorMar 31, 02:01 PM→Mar 31, 02:25 PMresolved
Mar 31, 02:25 PM
resolved — This incident has been resolved.
Mar 31, 02:01 PM
identified — Synthetic Monitoring check creation/update for scripted and browser checks might be blocked in the plugin app for some probes. The issue only impacts creating/updating checks from the plugin app. It d...
Some of the CloudWatch queries are failing
majorMar 31, 09:48 AM→Mar 31, 10:24 AMresolved
Mar 31, 10:24 AM
resolved — This incident has been resolved.
Mar 31, 09:49 AM
monitoring — We are continuing to monitor for any further issues.
Mar 31, 09:48 AM
monitoring — Some of the CloudWatch queries were failing.
Started at 08:37 UTC
Monitoring from 09:21 UTC
Tempo Reads Outage for Small Subset of Customers
noneMar 30, 04:30 PM→Mar 30, 04:30 PMresolved
Mar 30, 06:34 PM
resolved — We encountered an issue impacting only a small subset of customers in the prod-us-central-0 region. The incident occurred between 16:20 and 17:50 UTC on 3/30/26. This incident is now resolved.
Related Incident Histories
Get Grafana Cloud Outage Alerts
Be the first to know when Grafana Cloud go down.