Grafana Cloud Outage History
Past incidents and downtime events
Complete history of Grafana Cloud outages, incidents, and service disruptions. Showing 50 most recent incidents.
May 2026(2 incidents)
k6 Partial Outage
4 updates
This incident has been resolved. Thank you for your patience.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
After further investigation, this issue may also be affecting Synthetic Monitoring. We continue to identify the cause and will update as soon as we have more information.
We’re currently investigating an issue affecting k6. Our team is actively working to identify the cause. Thank you for your patience.
Ingestion Errors for AWS Cloud Provider Observability Metric Streams in prod-us-central-7
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are investigating an issue with ingesting Metrics for AWS Cloud Provider Observability with Metric Streams. Users experiencing this issue may encounter ingestion errors in the "prod-us-central-7" region only starting from ~06:30UTC. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
April 2026(23 incidents)
Gateway Slowness Detected in Prod (US-East-1)
2 updates
After further review, this was a false alarm and should not have affected any users. This incident has been resolved. Thank you for your patience.
Successful requests have dropped, users may not be able to access their instances.. The issue is under investigation.
Investigating Issues Saving SQL Datasource Credentials
3 updates
This incident has been resolved. Thank you for your patience.
We’ve identified the cause of the issue impacting SQL datasources. Our team is currently implementing a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
We are currently investigating reports of issues affecting SQL-based data sources where users are unable to save credentials. This appears to impact a subset of customers and may be occurring across multiple regions. We are actively working to determine the scope and root cause. We will provide updates as more information becomes available.
Performance Testing – Degraded Service (Resolved)
1 update
We experienced degraded performance affecting Performance Testing from 13:10 UTC to 13:20 UTC. During this time, users may not have been able to start new test runs. The issue has been resolved, and the service is now operating normally. We apologize for any disruption this may have caused and appreciate your patience.
Elevated write latency for AWS Metrics Streaming integration in us-east-3 region.
1 update
We were facing an incident with AWS Metrics Streaming integration in us-east-3 region manifesting in elevated ingestion latency. The incident started at around 10:45 UTC and was resolved at around 12:30 UTC. Some tenants could see an elevated write latency, but all requests were being processed and we don't expect any data loss during the time of the incident. The incident is now resolved, but we keep monitoring the system's health.
InfluxDB Datasource - Intermittent Failures
4 updates
This incident has been resolved. Thank you for your patience.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
We’ve identified the cause of the issue impacting the InfluxDB datasource. Our team is currently implementing a fix.
We’re currently investigating an issue affecting the InfluxDB plugin. Some users may see intermittent failures. Our team is actively working to identify the cause. Thank you for your patience.
Restrictions on Alerts & Reports for Grafana Cloud Free/Trial Users
4 updates
Grafana Labs has taken steps to safeguard the Grafana Cloud platform against the distribution of unauthorized emails. We have implemented the following changes to new Grafana Cloud Free and Trial accounts, effective immediately: Users who open Grafana Cloud Free and Trial accounts can only send email alerts and reports to users within their Grafana instance. Other email recipients will be rejected. Additionally, the Cloud Alertmanager is no longer available for these instances, requiring all alerting to be configured via the native Grafana Alertmanager. All other integrations for alerting remain functional. If users would like to expand their email capabilities, they can upgrade their Grafana Cloud account to scale their use case, as needed. These changes to alerting and reporting will be applied to all new Grafana Cloud Free and Trial accounts. Existing Grafana Cloud Free and Trial accounts opened before April 20 and all other Grafana Cloud account types are unaffected by these changes.
Grafana Labs is implementing measures to safeguard the Grafana Cloud platform against ongoing unauthorized use while preserving the capabilities relied upon by our community. Effective immediately, we have made the following modifications to the platform: Alerting Email alerting has been disabled for new Grafana Cloud Free and Trial accounts; however, all other integrations such as webhooks remain functional. Additionally, Cloud Alertmanager is now disabled for Grafana instances in these accounts, requiring all configuration to occur via the native Grafana Alertmanager. Existing Grafana Cloud accounts remain unaffected by these restrictions and all other Grafana Cloud account types are also unaffected. Reporting To prevent the distribution of unauthorized emails, Grafana instances in new Grafana Cloud Free and Trial accounts are limited to sending reports exclusively to users within their Grafana instance. Standard reporting functionality continues unrestricted for existing Grafana Cloud accounts and all other Grafana Cloud account types. We remain committed to further refining platform security to ensure a safe and open environment for our entire user base.
We are continuing to monitor for any further issues.
Grafana Labs is taking steps to safeguard our Grafana Cloud platform against unauthorized use while maintaining the Grafana Cloud Free and Trial tiers of service our users and the community have come to rely on. As of Monday April 20, alerting and reporting capabilities have been disabled in new Grafana Cloud Free and trial stacks. We are working towards deploying improvements and restoring those functionalities in a way that keeps our platform secure and open for all of our users.
Cloudwatch Datasource Outage
3 updates
This incident has been resolved. Thank you for your patience.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
We’re currently investigating an issue affecting Cloudwatch datasources. Our team is actively working to identify the cause. Thank you for your patience.
Elevated 429 Errors Impacting Metrics Querying Across Multiple Regions
3 updates
This incident has been resolved. Thank you for your patience.
The issue is now confirmed to be widespread, affecting Prometheus across all regions. Customers may continue to experience elevated 429 (rate limit) errors, particularly when querying metrics, with failures or inconsistent responses possible. Our engineering team remains fully engaged and is actively working on mitigation and resolution efforts with the highest priority.
We are currently experiencing a major incident causing elevated 429 (rate limit) errors across multiple regions, primarily impacting metrics querying. This is a high-priority issue, and our engineering team is actively engaged and working urgently to identify the root cause and restore full service as quickly as possible. Customers may experience widespread failures or delays when querying metrics during this time. We understand the significant impact this may have and will continue to provide updates as more information becomes available.
Query Caching - Degraded Performance
3 updates
This incident has been resolved
Currently prod-us-east-0 and prod-eu-west-3 have recovered, and we are continuing to monitor prod-us-central-0 which is in the process of recovery.
As of 20:52 UTC, we are currently investigating degraded Query Caching performance in multiple regions. For datasources where query caching is configured, some queries may take longer than usual. Our team is actively working to identify the cause. Thank you for your patience.
Issues on Stack creation
3 updates
This incident has been resolved.
The issue is fixed and we are currently monitoring the service.
Since today 16th at ~12:11UTC we are seeing issues on stack creation across all our regions. Customers will experience error message when attempting to create a stack. Our engineering team has identified the source of the issue as external to Grafana (provider), and they are tracking its recovery.
Degraded Ticket Visibility in Support System
2 updates
This incident has been resolved and our ticketing system is fully operational. Thank you for your patience.
We are currently experiencing an issue with our ticketing system provider that is affecting how tickets appear within our internal support views. We are continuing to receive all new tickets successfully, and no requests are being lost at this time. Our team is actively monitoring the situation and working to ensure all incoming requests are reviewed, including those that may not be immediately visible in standard views. We will provide further updates as we receive more information from our provider. We appreciate your patience.
K6 Sporadic DNS Issues
4 updates
This incident is now resolved. We had intermediary issues with a flaky DNS server that caused random tests to not start properly. Since the DNS server was fixed, we haven't been seeing the issue anymore.
Our engineering team has deployed a fix and we are currently monitoring the behaviour of the system until full resolution.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.
We are having sporadic DNS issues that occasionally affect the start of cloud test runs, causing them to abort. We are currently working to resolve. The issue has been occurring since April 9.
k6 Cloud Service Disruption
1 update
Between approximately 12:30 UTC and 13:15 UTC, k6 Cloud experienced a service disruption due to issues introduced in a recent API release. During this time, users were unable to access the k6 Cloud application. The issue was mitigated by reverting the release, and service has since been fully restored.
Loki write instability in prod-eu-west-2.loki-prod-012
1 update
There was a period of write instability yesterday. It was between ~1330 -1730 UTC yesterday. This was due to a scheduled maintenance.
Grafana Cloud Logs - Write degradation in us-east-3
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are seeing issues on the write path for Loki in cluster in us-east-3, and we are actively investigating this issue.
Tempo Write Outage
3 updates
This incident has been resolved. Thank you for your patience.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within an hour.
We are currently investigating a write outage affecting prod-us-east-3. The issue began at 18:50 UTC. Users may experience errors, timeouts, or unavailability while we work to identify the cause and restore service.
K6 Browser Testing/Timeline Not Available
3 updates
This incident has been resolved. Thank you for your patience.
We’ve identified the cause of the issue impacting k6 browser testing/timeline. Our team is currently implementing a fix. We’ll provide another update in two hours or sooner if the situation changes.
We’re currently investigating an issue affecting browser testing. Users running browser tests will not be able to see the browser timeline. Our team is actively working to identify the cause and will share an update within two hours. Thank you for your patience.
Stability Issues for Some Customers in the prod-gb-south-1 Region.
1 update
We had a stability issue for a subset of customers in the prod-gb-south-1 region. The impact was between UTC 15:20-16:30 which impacted roughly 30% of queries and rules evaluations. We've applied mitigations, queries should be back to normal.
Unable to Edit Notification Policies
4 updates
This incident has been resolved. Thank you for your patience.
We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.
We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.
We’re currently investigating an issue affecting notification policies. Our team is actively working to identify the cause and will share an update within 2 hours. Thank you for your patience.
Notification Policies and Contact Points Missing in UI on the Slow Release Channel
5 updates
This incident has been resolved.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within 2 hours.
We’ve identified the cause of the issue impacting the Notification Policy and Contact Point UI. Our team is currently implementing a fix. We’ll provide another update when the fix is deployed and we monitor the expected improvement.
We’re continuing to investigate the issue with the alerting UI. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 2 hours.
We’re currently investigating an issue affecting notification policies and contact points for instances on the slow release channel. Alerting API calls for contact points and notification policies return data as expected, so this appears to be limited to the UI. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.
Partial K6 Test Run Outage
2 updates
This incident has been resolved. Thank you for your patience.
We're experiencing an outage affecting test runs that use k6 extensions. The issue prevents users from executing these types of test runs both locally and in Grafana Cloud. Test runs that do not use extensions are not affected by this incident.
Query degradation and possible rule evaluation failure on prod-eu-west-0.cortex-prod-01
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are currently observing delays in ingesting data, possibly causing partial query results and failed rule evaluations for prod-eu-west-0.cortex-prod-01 metrics cell.
AWS integration Degraded Performance
2 updates
This incident has been resolved. Thank you for your patience.
We are investigating a noticeable drop in active series for the AWS integration that began around 18:15 UTC. This issue may cause scrapes to hit rate limits, which can result in individual data points not being collected for the serverless integration. The impact is intermittent and may affect any customer using the AWS integration, regardless of region. We are currently working to identify the cause and will provide an update as soon as we have more information.
March 2026(24 incidents)
Prometheus writes in prod-eu-west-3 are degraded
10 updates
This incident has been resolved. Thank you for your patience.
We are continuing to monitor for any further issues.
We have deployed mitigation and seen improvement in write failures over the past week. We are still seeing intermittent spikes in latency and continue to monitor.
We are still seeing intermittent issues and continue to seek a resolution
We are continuing to monitor for any further issues.
We are continuing to monitor this through the weekend.
We are continuing to monitor the previously impacted environments.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
The metric writes issue reported in https://status.grafana.com/incidents/gfshj17lxj5z is still ongoing. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.
k6 Cloud Degradation
1 update
From approximately 11:00 UTC - 15:00 UTC we had a degradation that caused test start errors for a large percentage of Cloud runs managed as scripts in the GCK6 app. This has since been resolved.
Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.
1 update
This is a retroactive status page linked to the following incident: https://status.grafana.com/incidents/38wwbz50ggrp This retroactive status page is meant to clarify the time of impact. This issue first started at ~2026-03-30 18:00 UTC. This is now resolved
Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.
2 updates
This incident has been resolved.
Synthetic Monitoring check creation/update for scripted and browser checks might be blocked in the plugin app for some probes. The issue only impacts creating/updating checks from the plugin app. It does not affect checks handled from Terraform or directly to the API. We’ve identified the cause of an issue impacting Synthetic Monitoring. Our team is currently implementing a fix. We’ll provide another update in 1-2 hours or sooner if the situation changes.
Some of the CloudWatch queries are failing
3 updates
This incident has been resolved.
We are continuing to monitor for any further issues.
Some of the CloudWatch queries were failing. Started at 08:37 UTC Monitoring from 09:21 UTC
Tempo Reads Outage for Small Subset of Customers
1 update
We encountered an issue impacting only a small subset of customers in the prod-us-central-0 region. The incident occurred between 16:20 and 17:50 UTC on 3/30/26. This incident is now resolved.
Some Grafana Instances Unavailable
6 updates
This incident has been resolved. Thank you for your patience.
We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again in 1 hour.
We’ve identified the cause of the issue impacting the instances. Our team is currently implementing a fix. We’ll provide another update in 1–2 hours, or sooner, if the situation changes.
We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.
We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.
We’re currently investigating an issue which is affecting primarily users on the Free tier. Impacted users will be met with a "your Grafana instance is loading" message indefinitely. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.
Prometheus writes, Logs, and Synthetic Monitoring in prod-eu-west-3 are degraded
6 updates
This incident has been resolved.
This is also now impacting Logs and Synthetic Monitoring in prod-eu-west-3. For Synthetic Monitoring, users might observe errors pushing check execution metrics, and this can eventually lead to missing data. In addition, users might observe errors evaluating Synthetic Monitoring provisioned alert rule evaluations, and this can lead to missed alerts. For Logs, there is no immediate impact on alerts, however, remote writes to Mimir is delayed which means users may see gaps in their recording rules.
We are moving this back to 'Investigating' as we are now observing a substantial drop in successful ingestion and increase in write path errors, and elevated rule evaluation latency and error. Reads are mostly fine. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.
We have not observed any recent errors, but we will continue to monitor while we work with our CSP.
A fix has been implemented and we are monitoring the results.
We are currently experiencing degraded writes for mimir-prod-22 in prod-eu-west-3 since 08:45Z.
Service degradation on Dashboard loading in several clusters.
1 update
An issue affecting Grafana Cloud instances was diagnosed yesterday 24th of March that avoided Dashboards to be loaded correctly. The incident impacted the following clusters: - GCP US Central (us-central-0) between 13:15 and 13:50 UTC - AWS US East (us-east-0) between 13:43 and 13:55 UTC - AWS US West (us-west-2) between 14:05 and 14:06 UTC The issue has been identified and measurement corrections has been applied.
Grafana Assistant Unavailable in prod-us-east-0
5 updates
This incident has been resolved.
The issue has been identified, and we are implementing a fix.
The impact extends beyond the TOS check. Assistant is completely unavailable in the impacted region.
We are continuing to investigate this issue.
We are aware of an issue currently impacting Grafana Assistant. Impacted users are met with a request to accept the TOS, however the plugin is failing upon accepting. Our engineering are currently investigating this issue.
Authentication API Database Down in prod-eu-west-2 and prod-eu-west-4
3 updates
This incident has been resolved.
We have observed impact in prod-eu-west-4 as well.
We are currently investigating an issue impacting the main database for Authentication API's in the prod-eu-west-2 region. Writes are currently failing, but reads are operational.
Various Datasource Issues
5 updates
This incident has been resolved.
We are continuing to monitor for any further issues.
We have observed recovery for the Cloudwatch Datasource. We are now seeing failures for the following Datasources: Aurora Opensearch X-Ray Timestream Redshift Sitewise A fix for the above is being rolled out now, and we will monitor progress. We will also change the name of this incident from "Cloudwatch Datasource Issues" to "Various Datasource Issues" to more accurately reflect impact.
We have identified the issue, and are rolling out the fix. We are already seeing improvements and will continue to monitor progress.
We are currently investigating an issue impacting the CloudWatch Datasource causing failures.
Degraded performance of Grafana Cloud k6 test runs
2 updates
Our engineering team has deployed a fix and we continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Some customers are seeing degraded performance and errors from certain v6 API endpoints. We are investigating the issue.
Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)
3 updates
We have been observing stability for a period of time and will mark the incident as resolved at this time.
We are continuing to investigate this issue with our CSP, and will provide updates as they become available.
We are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. Our engineering team is already working on restoring the service.
Rule Evaluation Outage in prod-us-west-0
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue impacting rule evaluation for a subset of customers in the prod-us-west-0 region. We will provide updates as they become available.
Increased number of Aborted-by-Systems with a k6 binary building errors
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
The issue has been identified and a fix is being implemented.
We are seeing an increased number of Aborted-by-Systems with a k6 binary building error. We are investigating the issue. The first occurrence of this happened back on March 9, has now been identified as a blocking issue for some customers.
Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)
3 updates
This incident has been resolved.
We are also reporting impact to Faro performance in the same region. We are continuing to investigate this issue.
We are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. Our engineering team is already working on restoring the service.
Some Write Failures in prod-eu-west-3.
5 updates
This incident has been resolved.
Things have been stable, and we have a potential mitigation should this issue arise again. We are monitoring the issue in the meantime.
There are ongoing intermittent elevated transient write failures. We will continue to provide additional updates as more information becomes available.
A fix has been implemented, and we are monitoring.
We are currently investigating an issue impacting a subset of users in the prod-eu-west-3 region. Impacted users are experiencing elevated transit write failures, with no degradation to the read path.
Metrics write path outage in prod-us-central-0 and prod-us-central-5
2 updates
This incident has been resolved.
From 15:30 to 15:45 UTC and from 16:53 to 17:03 UTC, the prod-us-central-0 and prod-us-central-5 regions saw elevated latency and error rates on the write path. We're monitoring now.
Fleet Managment Elevated Rate of Errors
3 updates
This incident has been resolved.
Our engineering team continues to work towards a resolution for this issue.
Some users in prod-us-central-0 may be seeing elevated rate of errors when fetching configurations. Our engineers are currently investigating this issue.
Service degradation on Logs Read path in AWS US West (us-west-0)
2 updates
This incident has been resolved.
There has been a reoccurrence o the issues on the Read path of Loki services on AWS US West since yesterday 9th around ~17:15UTC. The issue has been identified, and resolutions steps has been taken to restore full service. We are currently monitoring the service status. The impact of this includes timeouts and 5xx errors when query logs for customers on this cluster.
Various Issues with HG Pages
2 updates
This incident has been resolved.
We are noticing issues with various HG pages. Our engineering team is actively looking into it.
Outage for prod-eu-central-0 due to AWS S3 outage.
5 updates
This incident has been resolved.
Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
We are continuing to investigate this issue.
We are seeing elevated errors rate and outages across many of our services in prod-eu-central-0, due to an on-going AWS S3 outage in that region.
Service degradation on Logs Read path in AWS US West (us-west-0)
3 updates
We continue to observe a continued period of stability since 19:40 UTC. At this time, we are considering this issue resolved
Since 16:35 UTC we have experienced stability and services are recovering. We are actively monitoring and working to fully stabilize
Our engineering team is investigating issues on the read path of Loki services on AWS US West since today aroun ~13:25UTC. These issues can cause timeouts and 5xx errors when query logs for customers on the cluster. The team is currently working to restore the service.
February 2026(1 incident)
Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0
8 updates
This incident is now resolved. During the incident the Cloud Metrics platform experienced intermittent latency spikes communicating with a backend cloud service in the prod-us-central-0 and prod-us-central-5 regions. During the incident the internal CSP-facing issue was escalated to a P1. After determining the scope of the latency spikes was limited to only one availability zone, the team mitigated the situation by migrating all write traffic from to the single nearly unaffected availability zone. As the CSP service team attempted to remedy the situation, the situation became worse and began affecting the previously unaffected zone. Given this, another mitigation path was needed. Changing the connection strategy employed by Cloud Metrics to a different method was deployed to all environments, stabilizing the write path once again as we found the different connection method was more reliable and not affected by these increases in latency. We have migrated all tenants back to multi-zone write paths and are happy with and confident in the current method of connectivity to the backend cloud service, which is the one we migrated to during the course of the incident. We have no immediate plans to use the previous problematic connectivity method for the foreseeable future.
We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.
We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.
We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for tracking.
We are continuing to investigate this issue alongside the CSP. Any notable updates will continue to be shared here for tracking.
We've implemented mitigation in place and are continuing to monitoring and investigating this issue.
We have begun rolling out mitigation steps to reduce write latency in the prod-us-central-0 and prod-us-central-5 regions. While these measures are expected to improve performance, we are continuing to investigate the underlying root cause of the issue. We will provide additional updates as more information becomes available.
Since February 19, we have been investigating an intermittent issue causing increased write latency in the prod-us-central-0 and prod-us-central-5 regions. The issue does not affect all traffic but may result in delayed write operations for some customers. Our engineering team is actively working to identify the root cause and stabilize performance. We will share additional updates as progress is made.
📡 Tired of checking Grafana Cloud status manually?
Better Stack monitors uptime every 30 seconds and alerts you instantly when Grafana Cloud goes down.