Grafana Cloud Outage History
Past incidents and downtime events
Complete history of Grafana Cloud outages, incidents, and service disruptions. Showing 50 most recent incidents.
March 2026(11 incidents)
Metrics write path outage in prod-us-central-0 and prod-us-central-5
1 update
From 15:30 to 15:45 UTC and from 16:53 to 17:03 UTC, the prod-us-central-0 and prod-us-central-5 regions saw elevated latency and error rates on the write path. We're monitoring now.
Fleet Managment Elevanted Rate of Errors
1 update
Some users in prod-us-central-0 may be seeing elevated rate of errors when fetching configurations. Our engineers are currently investigating this issue.
Outage for prod-eu-central-0 due to AWS S3 outage.
5 updates
This incident has been resolved.
Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.
We are continuing to investigate this issue.
We are seeing elevated errors rate and outages across many of our services in prod-eu-central-0, due to an on-going AWS S3 outage in that region.
Service degradation on Logs Read path in AWS US West (us-west-0)
3 updates
We continue to observe a continued period of stability since 19:40 UTC. At this time, we are considering this issue resolved
Since 16:35 UTC we have experienced stability and services are recovering. We are actively monitoring and working to fully stabilize
Our engineering team is investigating issues on the read path of Loki services on AWS US West since today aroun ~13:25UTC. These issues can cause timeouts and 5xx errors when query logs for customers on the cluster. The team is currently working to restore the service.
Some Grafana Instances Unavailable
2 updates
This incident has been resolved.
We have identified an issue which is causing some instances to become unavailable. Our engineering team is actively working on mitigating the issue. We will continue to share updates as they become available.
Write failures in prod-eu-west-0
3 updates
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
Engineering has released a fix and as of 22:00 UTC, customers should no longer experience write failures and delays in rule evaluation. We will continue to monitor for recurrence and provide updates accordingly.
A recent incident affecting the data read path and rule execution within prod-eu-west-0 began at ~21:05 UTC on March 5, 2026. Customers with instances in this region may experience write failures and delays in rule evaluation. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)
5 updates
This incident has been resolved.
We continue to monitor mitigation efforts and work with our CSP.
The impacted has been reduced to slight intermittency. We continue to work with our CSP to reach a complete resolution.
Since today at 11:55 UTC time we are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. We are also reporting impact to Faro performance in the same region. Our engineering team is already working on restoring the service.
Since today at 11:55 UTC time we are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. Our engineering team is already working on restoring the service.
Elevated rate of errors for Fleet Management in prod-us-central-0
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently experiencing an issue with Fleet Management in prod-us-central-0. Users in prod-us-central-0 may observe elevated rate of errors when fetching configurations.
Test Run Browser Screenshot Upload Failing
1 update
Test run browser screenshot upload experienced failures from 13:12 to 14:51 UTC. The issue has been resolved
Write outage for logs in prod-eu-west-3
3 updates
This incident has been resolved.
We are now experiencing write outage for logs in prod-eu-west-3. Our Engineering team is aware and currently investigating this. We will provide further updates accordingly.
We are experiencing increased write latency for logs in prod-eu-west-3. Our Engineering team is aware and currently investigating this. We will provide further updates accordingly.
Complete outage in prod-me-central-1
9 updates
We are actively monitoring the situation, but at this time there are no new updates to share. The next update will be provided once we have more information to share. Please reach out to our Support team if you have any questions.
We are continuing to investigate this issue.
Please continue to refer to the AWS status page for more detailed updates specific to AWS. https://health.aws.amazon.com/health/status AWS are recommending that affected customers move workloads to alternate regions, and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/ 3. If your instance remains available and you have not configured your dashboards as code, then you may be able to use `grafanactl` to migrate dashboards https://grafana.com/docs/grafana/latest/as-code/observability-as-code/grafana-cli/grafanacli-workflows/ https://grafana.github.io/grafanactl/ We are continuing to work with our CSP at this time, and will provide updates as they are available.
AWS are recommending that affected customers move workloads to alternate regions https://health.aws.amazon.com/health/status and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/ 3. If your instance remains available and you have not configured your dashboards as code, then you may be able to use `grafanactl` to migrate dashboards https://grafana.com/docs/grafana/latest/as-code/observability-as-code/grafana-cli/grafanacli-workflows/ https://grafana.github.io/grafanactl/ We will provide updates when we have them, but we do not have an expected resolution time at this point.
Customers are recommended to configure a new blank stack in an alternative Grafana Cloud region and to reconfigure their clients (such as Grafana Alloy) to send telemetry to that region, Fleet Management can be used for this purpose https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/
We are updating this incident to reflect a complete outage in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.
We are observing write and read outage errors across all databases (metrics, logs, traces) in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.
We are observing write and read outage errors across all databases (metrics, logs, traces) in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.
We are seeing elevated write and read path errors in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.
February 2026(31 incidents)
Trace querying issue in all Tempo clusters
3 updates
This incident has been resolved.
Our team has identified the issue, and are in the process of testing a fix.
We're currently working on an issue where portions of data may be temporarily unretrievable, affecting a small percentage of tenants in all Tempo clusters.
Increased Latency for Small Subset of Customers
1 update
A recent rollout caused the AuthZ (RBAC) service to perform many redundant folder-tree fetches for each authorization check. For a small number of tenants in the prod-us-east-0 and prod-eu-west-2 regions with very large folder trees. This added a few milliseconds to every check, which increased request latency. The approximate timeframe of the impact is: 2026-02-26 17:24:43 UTC to 2026-02-27 14:33:53 UTC. This has now been resolved.
Incorrect pipeline assignment after custom attributes are assigned
3 updates
This incident has been resolved.
The issue has been identified and we are working on a fix.
We are investigating issues with incorrect pipeline assignment after custom attributes are assigned.
Grafana Cloud Faro slowness of listing and uploading sourcemaps in all regions.
3 updates
This incident has been resolved.
Uploads should work without an issue now. However, listing might still result in occasional timeouts - we're actively addressing this problem.
We're experiencing an issue for all Grafana Cloud regions, which manifest in slowness when uploading and listing sourcemaps. The issue most significantly affects users who have a large sourcemap files. We've identified the issue and our team is currently working on a fix.
Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0
7 updates
We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.
We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.
We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for tracking.
We are continuing to investigate this issue alongside the CSP. Any notable updates will continue to be shared here for tracking.
We've implemented mitigation in place and are continuing to monitoring and investigating this issue.
We have begun rolling out mitigation steps to reduce write latency in the prod-us-central-0 and prod-us-central-5 regions. While these measures are expected to improve performance, we are continuing to investigate the underlying root cause of the issue. We will provide additional updates as more information becomes available.
Since February 19, we have been investigating an intermittent issue causing increased write latency in the prod-us-central-0 and prod-us-central-5 regions. The issue does not affect all traffic but may result in delayed write operations for some customers. Our engineering team is actively working to identify the root cause and stabilize performance. We will share additional updates as progress is made.
Issues Loading Dashboards and Alert Folders in Hosted Grafana
5 updates
This incident has been resolved.
A fix has been implemented, and we are observing recovery across all impacted regions. We will continue to monitor progress.
The issue has been identified, and we are in the process of rolling out a fix.
While we work on narrowing down the scope, we can confirm that deployments in the prod-us-east-0 region are impacted.
Some users may be experiencing issues loading dashboard and alert folders in Hosted Grafana. We will provide more information as it becomes available to us.
Partial Write & Rule Evaluation Outage in prod-eu-west-3
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are currently investigating an issue which is causing a partial write, and rule evaluation outage in the specified region. We will continue to provide updates as they are available
Grafana Cloud Traces prod-eu-west-6 region (AWS Ireland) wrong URL endpoint shown for traces ingestion.
3 updates
This incident has been resolved.
The fix was deployed to all affected, already existing tenants. All newly created tenants will not face the issue as well. We're monitoring the incident, but it should be resolved by now.
We identified an issue with the incorrect URL endpoint being shown for traces ingestion in prod-eu-west-6 region (AWS Ireland). Using the displayed URL will result in traces not being able to be ingested. The AWS private link ingestion should work without issues though. The issue affects all tenants in this region and our team is in the process of deploying a fix to address this issue.
Some Alert Rule Evaluations Failing
3 updates
This incident has been resolved.
A fix has been implemented, and we are monitoring results.
We are currently investigating an issue impacting a subset of users in the prod-us-east-0 region. Impacted customers will receive a "failed to execute query" error when evaluating alert rules.
Degraded performance of Grafana Cloud k6 test runs
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We see intermittent failures and slow start-up of test-runs. We are currently investigating this issue.
Brief Disruption in Azure prod-us-7-central
1 update
We experienced an issue impacting a cell within the Azure prod-us-central-7 region, which occurred between 14:26 and 14:36. Affected users may have noticed increased errors with rule evaluations, as well as a some read/write errors. We have resolved this issue, and will continue to monitor.
Grafana Cloud metrics degredation
3 updates
This incident has been resolved.
We are continuing to investigate this issue.
We've been alerted to issues querying and are investigating
Maintenance task for Synthetic Monitoring ProbeFailedExecutionsTooHigh alert rule
2 updates
This incident has been resolved.
Alert instances for Synthetic Monitoring ProbeFailedExecutionsTooHigh provisioned alert rule that are firing during this maintenance might resolve and fire again in the next evaluation. Only the API is affected. Estimated time window is 15:00–16:00 UTC Impacted clusters are: prod-eu-west-5 prod-us-east-4 prod-eu-west-6 prod-sa-east-0 prod-ap-south-0 prod-ap-southeast-0 prod-me-central-0 prod-au-southeast-0 prod-ap-southeast-2
Degradation of service on Synthetic Monitoring Public Probe AWS Canada (Calgary)
1 update
There was a service degradation today from ~12:09 UTC until ~12:35 UTC on the Public Probe of Calgary for Synthetic Monitoring. Impact may include SM check fails where the probe was used.
Self-Serve Users Unable to Sign Up
2 updates
This incident has been resolved.
We are currently investigating an issue which is causing users the inability to sign up for self-serve Grafana. We will continue to update with more information as we progress our investigation.
Loki Delete Endpoint Bug
4 updates
This incident has been resolved.
We are continuing to work on a fix for this issue.
A fix is being made to mitigate the issue. We will provide further updates accordingly.
As of 22:45 UTC, we have identified a serious bug affecting the delete endpoint for all Loki regions. As a precaution, the endpoint has been temporarily disabled. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Loki writes outage in prod-ca-east-0
3 updates
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
We have scaled up to handle the increased traffic and are seeing marked improvement. We will continue to monitor and provide updates.
We have been alerted to an ongoing Loki writes outage in the prod-ca-east-0 region. Our Engineering team is actively investigating this.
Essential Maintenance for Faro Services
2 updates
This incident has been resolved.
We are undergoing essential maintenance for Faro services. Users may experience a short outage for the service of <1 minute during this time. We expect this to be finished within an hour.
Grafana Cloud Metrics elevated write and rule evaluation latency in prod-eu-west-2 region.
4 updates
We no longer observed any problems with our services - this incident has been resolved.
The fix has been implemented and services are back to normal. We're currently monitoring health of the services before resolving this incident.
The issue has been identified and our team is currently working on a fix.
Since 12:17 UTC, we're observing an increased latency for data ingestion and rule evaluation in Grafana Cloud Metrics, prod-eu-west-2 region. We're currently investigating the issue.
Unable to Install Slack Integration
4 updates
This incident has been resolved.
We are in the process of rolling out the fix.
We have identified the issue, and are working on a fix.
We are aware of an issue that is preventing the installation of the Slack integration. We are currently investigating this, and will provide updates as they become available.
Loki error response rate spike on prod-ap-southeast-1
3 updates
This incident has been resolved.
We have deployed temporary measures to mitigate the issue, but there was a writes outage from 06:26 to 06:37 UTC.
cloud logging is facing write issues in this region, our team is looking into this.
Write failures in prod-us-central-0
2 updates
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved.
As of 00:10, we are currently experiencing write failures in a single cell affecting customers in prod-us-central-0. Impacted customers may see failed or dropped writes. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Athena Queries Broken
4 updates
This incident has been resolved.
We are seeing recovery in impacted environments. We will continue to monitor the progress.
Our engineering team is still investigating this issue.
We are currently investigating an issue resulting in broken queries for the Athena data source.
Grafana Cloud Logs – Write Ingestion Degradation
3 updates
This incident has been resolved.
We are continuing to monitor for any further issues.
Between 09:47 and 10:14 UTC, Grafana Cloud Logs within a single cell residing in the prod-ap-southeast-1 region experienced an issue affecting write ingestion only. During this time, some log writes may have failed or been delayed. Log reads were not impacted and remained fully available throughout the incident. Our engineering team quickly identified the cause of the issue and are monitoring the service. The service has been operating normally since 10:14 UTC.
Multiple free tier customers are getting "no fields to display" when viewing logs instead of labels and structured metadata
2 updates
This incident has been resolved.
We are currently investigating this issue.
Grafana Cloud Metrics – Write Ingestion Degradation
1 update
Between 18:32 and 18:46 UTC, Grafana Cloud Metrics within a single cell residing in the prod-us-west-0 region experienced an issue affecting write ingestion only. During this time, some metric writes may have failed or been delayed. Metric reads were not impacted and remained fully available throughout the incident. Our engineering team quickly identified the cause of the issue and implemented mitigation steps to restore normal write ingestion. The service has been operating normally since 18:46 UTC.
Tempo write path degradation in prod-us-west-0
1 update
From 17:43 UTC to 18:05 UTC, a subset of customers experienced elevated latency and a peak error rate of approximately 22% for trace ingestion.
Hosted Metrics partial outage of read path in us-central-0 region.
3 updates
This incident has been resolved.
Services recovered and there's no active issue anymore. We're still monitoring the overall health.
We're experiencing an issue in us-central-0 region for Hosted Metrics offering - the issue manifest in rule evaluations failing, and possibility of queries returning stale data. We're actively investigating the cause of the issue.
Inconsistent threshold check results reported intermittently
4 updates
This incident has been resolved.
The issue causing the incident has been identified, and the fix has been deployed. All new test runs work consistently
We are continuing to work on a fix for this issue.
We encountered a subtle bug which caused our test-run finalization process to read from stale threshold status because of a synchronization issue. We have since resolved the bug, and new test runs will work properly. Impacted test runs will need to be fixed via further correction on our end. We will continue to provide updates on the progress of the fix for impacted test runs.
Grafana Cloud: k6 -> Cloud Output Test Runs and Result Analysis (degraded performance).
1 update
The HTTP Response time in the Performance trend overview did not show for new test-runs. After the fix, all data should show again.
IRM Pages Not Accessible
4 updates
This incident has been resolved.
A fix as implemented, and we are seeing recovery throughout the rollout. We will continue to monitor results.
The issue has been identified and we are implementing a fix.
As of 16:40 UTC, we are currently investigating an issue where IRM pages are not accessible. Users may experience errors or be unable to load IRM-related pages during this time. Our team is actively working to identify the root cause and restore full functionality as quickly as possible. We will provide updates as more information becomes available.
January 2026(8 incidents)
Some Dashboards in Prod-Us-Central-3 unable to load
3 updates
This incident has been resolved.
A fix has been implemented, and we are monitoring the results.
We are currently investigating an issue impacting dashboards for users in the prod-us-central-3 region. This is preventing impacted dashboards from loading as expected. This is also impacting a very small subset of users in the prod-us-central-0 region as well. We will provide more details regarding the scope as they become available.
Grafana OnCall and IRM Loading Issues
3 updates
We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.
As of 22:55 UTC, we have observed marked improvement with the incident impacting IRM and OnCall. We are still investigating and will continue to monitor and provide updates.
We are currently investigating an issue impacting some customers when accessing Grafana Oncall and IRM. Impacted customers may experience long load times, or even time-outs when attempting to access these components. We'll provide more information as it becomes available.
Grafana Cloud instances unavailable
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
Some users experience their Grafana Cloud instances as unavailable.
Increased write error rate for logs in prod-us-west-0
1 update
We were experiencing increased write error rate for logs in prod-us-west-0 from 6:55 to 7:15 UTC. We have since observed continued stability and are marking this as resolved.
Upgrade from Free → Pro failing for users
3 updates
Engineering has released a fix and as of 00:13 UTC, customers should no longer experience issues upgrading from Free to Pro subscriptions. At this time, we are considering this issue resolved. No further updates.
Engineering has identified the issue and is currently exploring remediation options. At this time, users will continue to experience the inability to upgrade from Free to Pro subscriptions. We will continue to provide updates as more information is shared.
As of 20:05 UTC, our engineering team became aware of an issue related to subscription plan upgrades. Users experiencing this issue will not be able to upgrade from a Free plan to a Pro subscription. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.
Investigating Issues with Email Delivery
3 updates
This incident has been resolved.
We are noticing significant improvement, and things are stabilizing as expected. Our engineering teams will continue to monitor progress.
We are currently investigating an issues impacting Email delivery for some Services, including Alert Notifications.
Synthetic monitoring secrets - proxy URL changes
2 updates
The incident is resolved. We are in contact with customers affected by this change.
During the secrets migration in https://status.grafana.com/incidents/47d1q4sphrmj, secrets proxy URLs for some customers updated in the following regions: prod-us-central-0, prod-us-east-0, and prod-eu-west-2. This was an unexpected breaking change affecting a subset of customers. This will specifically affect customers who are using secrets on private probes behind a firewall. We are investigating. If your private probes are impacted, we ask you to update firewall rules for the secrets proxy to allow outbound connections to the updated hosts: gsm-proxy-prod-eu-west-2.grafana.net -> gsm-proxy-prod-eu-west-4.grafana.net gsm-proxy-prod-us-central-0.grafana.net -> gsm-proxy-prod-us-central-4.grafana.net gsm-proxy-prod-us-east-0.grafana.net -> gsm-proxy-prod-us-east-2.grafana.net Note that this URL change affects only a small subset of customers, the majority of customers will not need to update firewall rules. For affected customers, private probes will show the following error in probe logs, for example: Error during test execution: failed to get secret: Get "https://gsm-proxy-prod-us-east-2.grafana.net/api/v1/secrets/.../decrypt": Forbidden undefined
Hosted Traces elevated write latency in prod-us-central-0 region.
3 updates
We consider this incident as resolved since the latency hasn't been elevated since the fix was applied. The issue was caused by a latency spike in a downstream dependency, causing an increased backpressure on the Hosted Traces ingestion path, which degraded gateway performance and resulted in an elevated write latency. After clearing the affected gateway services the degraded state went away and normal operation was restored.
The issue was identified and a fix was applied. After applying the fix, latency went down to a regular and expected value. We're currently monitoring the component's health before resolving the incident.
We're currently investigating an issue with elevated write latency in Hosted Traces prod-us-central-0 region. It's experiencing sustained high write latency since 7:20 AM UTC. Only a small subset of the requests are impacted.