Grafana Cloud Outage History

Past incidents and downtime events

Complete history of Grafana Cloud outages, incidents, and service disruptions. Showing 50 most recent incidents.

← Back to Grafana Cloud current status

May 2026(2 incidents)

majorresolvedMay 4, 10:58 PM — Resolved May 5, 02:09 AM

k6 Partial Outage

4 updates

resolvedMay 5, 02:09 AM

This incident has been resolved. Thank you for your patience.

monitoringMay 5, 12:04 AM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingMay 4, 11:23 PM

After further investigation, this issue may also be affecting Synthetic Monitoring. We continue to identify the cause and will update as soon as we have more information.

investigatingMay 4, 10:58 PM

We’re currently investigating an issue affecting k6. Our team is actively working to identify the cause. Thank you for your patience.

majorresolvedMay 1, 09:14 AM — Resolved May 1, 10:27 AM

Ingestion Errors for AWS Cloud Provider Observability Metric Streams in prod-us-central-7

4 updates

resolvedMay 1, 10:27 AM

This incident has been resolved.

monitoringMay 1, 09:43 AM

A fix has been implemented and we are monitoring the results.

investigatingMay 1, 09:42 AM

We are continuing to investigate this issue.

investigatingMay 1, 09:14 AM

We are investigating an issue with ingesting Metrics for AWS Cloud Provider Observability with Metric Streams. Users experiencing this issue may encounter ingestion errors in the "prod-us-central-7" region only starting from ~06:30UTC. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

April 2026(23 incidents)

minorresolvedApr 28, 09:20 AM — Resolved Apr 30, 03:11 PM

Gateway Slowness Detected in Prod (US-East-1)

2 updates

resolvedApr 30, 03:11 PM

After further review, this was a false alarm and should not have affected any users. This incident has been resolved. Thank you for your patience.

investigatingApr 28, 09:20 AM

Successful requests have dropped, users may not be able to access their instances.. The issue is under investigation.

minorresolvedApr 28, 06:46 PM — Resolved Apr 29, 01:37 PM

Investigating Issues Saving SQL Datasource Credentials

3 updates

resolvedApr 29, 01:37 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 28, 06:59 PM

We’ve identified the cause of the issue impacting SQL datasources. Our team is currently implementing a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingApr 28, 06:46 PM

We are currently investigating reports of issues affecting SQL-based data sources where users are unable to save credentials. This appears to impact a subset of customers and may be occurring across multiple regions. We are actively working to determine the scope and root cause. We will provide updates as more information becomes available.

noneresolvedApr 29, 12:00 PM — Resolved Apr 29, 12:00 PM

Performance Testing – Degraded Service (Resolved)

1 update

resolvedApr 29, 01:51 PM

We experienced degraded performance affecting Performance Testing from 13:10 UTC to 13:20 UTC. During this time, users may not have been able to start new test runs. The issue has been resolved, and the service is now operating normally. We apologize for any disruption this may have caused and appreciate your patience.

minorresolvedApr 29, 10:30 AM — Resolved Apr 29, 10:30 AM

Elevated write latency for AWS Metrics Streaming integration in us-east-3 region.

1 update

resolvedApr 29, 12:57 PM

We were facing an incident with AWS Metrics Streaming integration in us-east-3 region manifesting in elevated ingestion latency. The incident started at around 10:45 UTC and was resolved at around 12:30 UTC. Some tenants could see an elevated write latency, but all requests were being processed and we don't expect any data loss during the time of the incident. The incident is now resolved, but we keep monitoring the system's health.

majorresolvedApr 27, 05:08 PM — Resolved Apr 27, 11:24 PM

InfluxDB Datasource - Intermittent Failures

4 updates

resolvedApr 27, 11:24 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 27, 11:13 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

identifiedApr 27, 06:01 PM

We’ve identified the cause of the issue impacting the InfluxDB datasource. Our team is currently implementing a fix.

investigatingApr 27, 05:08 PM

We’re currently investigating an issue affecting the InfluxDB plugin. Some users may see intermittent failures. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedApr 20, 09:12 PM — Resolved Apr 24, 03:04 PM

Restrictions on Alerts & Reports for Grafana Cloud Free/Trial Users

4 updates

resolvedApr 24, 03:04 PM

Grafana Labs has taken steps to safeguard the Grafana Cloud platform against the distribution of unauthorized emails. We have implemented the following changes to new Grafana Cloud Free and Trial accounts, effective immediately: Users who open Grafana Cloud Free and Trial accounts can only send email alerts and reports to users within their Grafana instance. Other email recipients will be rejected. Additionally, the Cloud Alertmanager is no longer available for these instances, requiring all alerting to be configured via the native Grafana Alertmanager. All other integrations for alerting remain functional. If users would like to expand their email capabilities, they can upgrade their Grafana Cloud account to scale their use case, as needed. These changes to alerting and reporting will be applied to all new Grafana Cloud Free and Trial accounts. Existing Grafana Cloud Free and Trial accounts opened before April 20 and all other Grafana Cloud account types are unaffected by these changes.

monitoringApr 22, 03:03 PM

Grafana Labs is implementing measures to safeguard the Grafana Cloud platform against ongoing unauthorized use while preserving the capabilities relied upon by our community. Effective immediately, we have made the following modifications to the platform: Alerting Email alerting has been disabled for new Grafana Cloud Free and Trial accounts; however, all other integrations such as webhooks remain functional. Additionally, Cloud Alertmanager is now disabled for Grafana instances in these accounts, requiring all configuration to occur via the native Grafana Alertmanager. Existing Grafana Cloud accounts remain unaffected by these restrictions and all other Grafana Cloud account types are also unaffected. Reporting To prevent the distribution of unauthorized emails, Grafana instances in new Grafana Cloud Free and Trial accounts are limited to sending reports exclusively to users within their Grafana instance. Standard reporting functionality continues unrestricted for existing Grafana Cloud accounts and all other Grafana Cloud account types. We remain committed to further refining platform security to ensure a safe and open environment for our entire user base.

monitoringApr 20, 10:07 PM

We are continuing to monitor for any further issues.

monitoringApr 20, 09:12 PM

Grafana Labs is taking steps to safeguard our Grafana Cloud platform against unauthorized use while maintaining the Grafana Cloud Free and Trial tiers of service our users and the community have come to rely on. As of Monday April 20, alerting and reporting capabilities have been disabled in new Grafana Cloud Free and trial stacks. We are working towards deploying improvements and restoring those functionalities in a way that keeps our platform secure and open for all of our users.

majorresolvedApr 23, 02:26 PM — Resolved Apr 23, 08:01 PM

Cloudwatch Datasource Outage

3 updates

resolvedApr 23, 08:01 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 23, 02:39 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingApr 23, 02:26 PM

We’re currently investigating an issue affecting Cloudwatch datasources. Our team is actively working to identify the cause. Thank you for your patience.

criticalresolvedApr 20, 02:09 PM — Resolved Apr 20, 02:30 PM

Elevated 429 Errors Impacting Metrics Querying Across Multiple Regions

3 updates

resolvedApr 20, 02:30 PM

This incident has been resolved. Thank you for your patience.

investigatingApr 20, 02:21 PM

The issue is now confirmed to be widespread, affecting Prometheus across all regions. Customers may continue to experience elevated 429 (rate limit) errors, particularly when querying metrics, with failures or inconsistent responses possible. Our engineering team remains fully engaged and is actively working on mitigation and resolution efforts with the highest priority.

investigatingApr 20, 02:09 PM

We are currently experiencing a major incident causing elevated 429 (rate limit) errors across multiple regions, primarily impacting metrics querying. This is a high-priority issue, and our engineering team is actively engaged and working urgently to identify the root cause and restore full service as quickly as possible. Customers may experience widespread failures or delays when querying metrics during this time. We understand the significant impact this may have and will continue to provide updates as more information becomes available.

minorresolvedApr 17, 09:23 PM — Resolved Apr 17, 10:58 PM

Query Caching - Degraded Performance

3 updates

resolvedApr 17, 10:58 PM

This incident has been resolved

monitoringApr 17, 10:09 PM

Currently prod-us-east-0 and prod-eu-west-3 have recovered, and we are continuing to monitor prod-us-central-0 which is in the process of recovery.

investigatingApr 17, 09:23 PM

As of 20:52 UTC, we are currently investigating degraded Query Caching performance in multiple regions. For datasources where query caching is configured, some queries may take longer than usual. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedApr 16, 12:52 PM — Resolved Apr 16, 02:02 PM

Issues on Stack creation

3 updates

resolvedApr 16, 02:02 PM

This incident has been resolved.

monitoringApr 16, 01:19 PM

The issue is fixed and we are currently monitoring the service.

identifiedApr 16, 12:52 PM

Since today 16th at ~12:11UTC we are seeing issues on stack creation across all our regions. Customers will experience error message when attempting to create a stack. Our engineering team has identified the source of the issue as external to Grafana (provider), and they are tracking its recovery.

minorresolvedApr 15, 04:07 PM — Resolved Apr 15, 04:25 PM

Degraded Ticket Visibility in Support System

2 updates

resolvedApr 15, 04:25 PM

This incident has been resolved and our ticketing system is fully operational. Thank you for your patience.

monitoringApr 15, 04:07 PM

We are currently experiencing an issue with our ticketing system provider that is affecting how tickets appear within our internal support views. We are continuing to receive all new tickets successfully, and no requests are being lost at this time. Our team is actively monitoring the situation and working to ensure all incoming requests are reviewed, including those that may not be immediately visible in standard views. We will provide further updates as we receive more information from our provider. We appreciate your patience.

minorresolvedApr 14, 09:22 AM — Resolved Apr 15, 12:59 PM

K6 Sporadic DNS Issues

4 updates

resolvedApr 15, 12:59 PM

This incident is now resolved. We had intermediary issues with a flaky DNS server that caused random tests to not start properly. Since the DNS server was fixed, we haven't been seeing the issue anymore.

monitoringApr 14, 02:29 PM

Our engineering team has deployed a fix and we are currently monitoring the behaviour of the system until full resolution.

monitoringApr 14, 02:29 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

identifiedApr 14, 09:22 AM

We are having sporadic DNS issues that occasionally affect the start of cloud test runs, causing them to abort. We are currently working to resolve. The issue has been occurring since April 9.

noneresolvedApr 14, 11:30 AM — Resolved Apr 14, 11:30 AM

k6 Cloud Service Disruption

1 update

resolvedApr 14, 01:44 PM

Between approximately 12:30 UTC and 13:15 UTC, k6 Cloud experienced a service disruption due to issues introduced in a recent API release. During this time, users were unable to access the k6 Cloud application. The issue was mitigated by reverting the release, and service has since been fully restored.

noneresolvedApr 13, 11:30 AM — Resolved Apr 13, 11:30 AM

Loki write instability in prod-eu-west-2.loki-prod-012

1 update

resolvedApr 14, 12:02 PM

There was a period of write instability yesterday. It was between ~1330 -1730 UTC yesterday. This was due to a scheduled maintenance.

majorresolvedApr 10, 11:53 PM — Resolved Apr 11, 12:36 AM

Grafana Cloud Logs - Write degradation in us-east-3

3 updates

resolvedApr 11, 12:36 AM

This incident has been resolved.

monitoringApr 11, 12:10 AM

A fix has been implemented and we are monitoring the results.

investigatingApr 10, 11:53 PM

We are seeing issues on the write path for Loki in cluster in us-east-3, and we are actively investigating this issue.

majorresolvedApr 10, 07:42 PM — Resolved Apr 10, 09:02 PM

Tempo Write Outage

3 updates

resolvedApr 10, 09:02 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 10, 07:53 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within an hour.

investigatingApr 10, 07:42 PM

We are currently investigating a write outage affecting prod-us-east-3. The issue began at 18:50 UTC. Users may experience errors, timeouts, or unavailability while we work to identify the cause and restore service.

minorresolvedApr 9, 05:34 PM — Resolved Apr 9, 06:50 PM

K6 Browser Testing/Timeline Not Available

3 updates

resolvedApr 9, 06:50 PM

This incident has been resolved. Thank you for your patience.

identifiedApr 9, 06:39 PM

We’ve identified the cause of the issue impacting k6 browser testing/timeline. Our team is currently implementing a fix. We’ll provide another update in two hours or sooner if the situation changes.

investigatingApr 9, 05:34 PM

We’re currently investigating an issue affecting browser testing. Users running browser tests will not be able to see the browser timeline. Our team is actively working to identify the cause and will share an update within two hours. Thank you for your patience.

minorresolvedApr 8, 05:00 PM — Resolved Apr 8, 05:00 PM

Stability Issues for Some Customers in the prod-gb-south-1 Region.

1 update

resolvedApr 8, 05:00 PM

We had a stability issue for a subset of customers in the prod-gb-south-1 region. The impact was between UTC 15:20-16:30 which impacted roughly 30% of queries and rules evaluations. We've applied mitigations, queries should be back to normal.

minorresolvedApr 7, 03:17 PM — Resolved Apr 7, 08:17 PM

Unable to Edit Notification Policies

4 updates

resolvedApr 7, 08:17 PM

This incident has been resolved. Thank you for your patience.

identifiedApr 7, 06:03 PM

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

identifiedApr 7, 04:52 PM

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

investigatingApr 7, 03:17 PM

We’re currently investigating an issue affecting notification policies. Our team is actively working to identify the cause and will share an update within 2 hours. Thank you for your patience.

minorresolvedApr 6, 02:48 PM — Resolved Apr 7, 12:26 PM

Notification Policies and Contact Points Missing in UI on the Slow Release Channel

5 updates

resolvedApr 7, 12:26 PM

This incident has been resolved.

monitoringApr 6, 11:58 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within 2 hours.

identifiedApr 6, 09:04 PM

We’ve identified the cause of the issue impacting the Notification Policy and Contact Point UI. Our team is currently implementing a fix. We’ll provide another update when the fix is deployed and we monitor the expected improvement.

investigatingApr 6, 04:13 PM

We’re continuing to investigate the issue with the alerting UI. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 2 hours.

investigatingApr 6, 02:48 PM

We’re currently investigating an issue affecting notification policies and contact points for instances on the slow release channel. Alerting API calls for contact points and notification policies return data as expected, so this appears to be limited to the UI. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.

majorresolvedApr 3, 03:29 PM — Resolved Apr 3, 05:38 PM

Partial K6 Test Run Outage

2 updates

resolvedApr 3, 05:38 PM

This incident has been resolved. Thank you for your patience.

investigatingApr 3, 03:29 PM

We're experiencing an outage affecting test runs that use k6 extensions. The issue prevents users from executing these types of test runs both locally and in Grafana Cloud. Test runs that do not use extensions are not affected by this incident.

minorresolvedApr 1, 09:56 AM — Resolved Apr 1, 09:13 PM

Query degradation and possible rule evaluation failure on prod-eu-west-0.cortex-prod-01

4 updates

resolvedApr 1, 09:13 PM

This incident has been resolved.

monitoringApr 1, 10:12 AM

A fix has been implemented and we are monitoring the results.

investigatingApr 1, 10:11 AM

We are continuing to investigate this issue.

investigatingApr 1, 09:56 AM

We are currently observing delays in ingesting data, possibly causing partial query results and failed rule evaluations for prod-eu-west-0.cortex-prod-01 metrics cell.

minorresolvedApr 1, 08:17 PM — Resolved Apr 1, 09:03 PM

AWS integration Degraded Performance

2 updates

resolvedApr 1, 09:03 PM

This incident has been resolved. Thank you for your patience.

investigatingApr 1, 08:17 PM

We are investigating a noticeable drop in active series for the AWS integration that began around 18:15 UTC. This issue may cause scrapes to hit rate limits, which can result in individual data points not being collected for the serverless integration. The impact is intermittent and may affect any customer using the AWS integration, regardless of region. We are currently working to identify the cause and will provide an update as soon as we have more information.

March 2026(24 incidents)

criticalresolvedMar 25, 02:11 PM — Resolved Apr 23, 08:07 PM

Prometheus writes in prod-eu-west-3 are degraded

10 updates

resolvedApr 23, 08:07 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 20, 03:08 PM

We are continuing to monitor for any further issues.

monitoringApr 14, 08:11 PM

We have deployed mitigation and seen improvement in write failures over the past week. We are still seeing intermittent spikes in latency and continue to monitor.

monitoringApr 8, 08:32 PM

We are still seeing intermittent issues and continue to seek a resolution

monitoringApr 2, 09:38 PM

We are continuing to monitor for any further issues.

monitoringMar 27, 09:05 PM

We are continuing to monitor this through the weekend.

monitoringMar 26, 05:45 PM

We are continuing to monitor the previously impacted environments.

monitoringMar 26, 12:04 PM

A fix has been implemented and we are monitoring the results.

investigatingMar 25, 09:35 PM

We are continuing to investigate this issue.

investigatingMar 25, 02:11 PM

The metric writes issue reported in https://status.grafana.com/incidents/gfshj17lxj5z is still ongoing. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.

noneresolvedMar 31, 02:59 PM — Resolved Mar 31, 02:59 PM

k6 Cloud Degradation

1 update

resolvedMar 31, 02:59 PM

From approximately 11:00 UTC - 15:00 UTC we had a degradation that caused test start errors for a large percentage of Cloud runs managed as scripts in the GCK6 app. This has since been resolved.

noneresolvedMar 31, 02:32 PM — Resolved Mar 31, 02:32 PM

Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.

1 update

resolvedMar 31, 02:32 PM

This is a retroactive status page linked to the following incident: https://status.grafana.com/incidents/38wwbz50ggrp This retroactive status page is meant to clarify the time of impact. This issue first started at ~2026-03-30 18:00 UTC. This is now resolved

majorresolvedMar 31, 02:01 PM — Resolved Mar 31, 02:25 PM

Synthetic Monitoring: Some Check Creations & Updates Might be Blocked.

2 updates

resolvedMar 31, 02:25 PM

This incident has been resolved.

identifiedMar 31, 02:01 PM

Synthetic Monitoring check creation/update for scripted and browser checks might be blocked in the plugin app for some probes. The issue only impacts creating/updating checks from the plugin app. It does not affect checks handled from Terraform or directly to the API. We’ve identified the cause of an issue impacting Synthetic Monitoring. Our team is currently implementing a fix. We’ll provide another update in 1-2 hours or sooner if the situation changes.

majorresolvedMar 31, 09:48 AM — Resolved Mar 31, 10:24 AM

Some of the CloudWatch queries are failing

3 updates

resolvedMar 31, 10:24 AM

This incident has been resolved.

monitoringMar 31, 09:49 AM

We are continuing to monitor for any further issues.

monitoringMar 31, 09:48 AM

Some of the CloudWatch queries were failing. Started at 08:37 UTC Monitoring from 09:21 UTC

noneresolvedMar 30, 04:30 PM — Resolved Mar 30, 04:30 PM

Tempo Reads Outage for Small Subset of Customers

1 update

resolvedMar 30, 06:34 PM

We encountered an issue impacting only a small subset of customers in the prod-us-central-0 region. The incident occurred between 16:20 and 17:50 UTC on 3/30/26. This incident is now resolved.

majorresolvedMar 27, 01:36 PM — Resolved Mar 27, 08:48 PM

Some Grafana Instances Unavailable

6 updates

resolvedMar 27, 08:48 PM

This incident has been resolved. Thank you for your patience.

monitoringMar 27, 08:16 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again in 1 hour.

identifiedMar 27, 06:10 PM

We’ve identified the cause of the issue impacting the instances. Our team is currently implementing a fix. We’ll provide another update in 1–2 hours, or sooner, if the situation changes.

investigatingMar 27, 04:36 PM

We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.

investigatingMar 27, 02:51 PM

We’re continuing to investigate the issue with Grafana instances. While we don’t have new information to share yet, our team is working to identify the root cause. Next update in 1-2 hours.

investigatingMar 27, 01:36 PM

We’re currently investigating an issue which is affecting primarily users on the Free tier. Impacted users will be met with a "your Grafana instance is loading" message indefinitely. Our team is actively working to identify the cause and will share an update within 1-2 hours. Thank you for your patience.

minorresolvedMar 24, 09:08 AM — Resolved Mar 25, 12:52 PM

Prometheus writes, Logs, and Synthetic Monitoring in prod-eu-west-3 are degraded

6 updates

resolvedMar 25, 12:52 PM

This incident has been resolved.

investigatingMar 25, 07:43 AM

This is also now impacting Logs and Synthetic Monitoring in prod-eu-west-3. For Synthetic Monitoring, users might observe errors pushing check execution metrics, and this can eventually lead to missing data. In addition, users might observe errors evaluating Synthetic Monitoring provisioned alert rule evaluations, and this can lead to missed alerts. For Logs, there is no immediate impact on alerts, however, remote writes to Mimir is delayed which means users may see gaps in their recording rules.

investigatingMar 25, 07:04 AM

We are moving this back to 'Investigating' as we are now observing a substantial drop in successful ingestion and increase in write path errors, and elevated rule evaluation latency and error. Reads are mostly fine. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.

monitoringMar 24, 09:23 PM

We have not observed any recent errors, but we will continue to monitor while we work with our CSP.

monitoringMar 24, 09:19 AM

A fix has been implemented and we are monitoring the results.

investigatingMar 24, 09:08 AM

We are currently experiencing degraded writes for mimir-prod-22 in prod-eu-west-3 since 08:45Z.

noneresolvedMar 24, 02:00 PM — Resolved Mar 24, 02:00 PM

Service degradation on Dashboard loading in several clusters.

1 update

resolvedMar 25, 10:30 AM

An issue affecting Grafana Cloud instances was diagnosed yesterday 24th of March that avoided Dashboards to be loaded correctly. The incident impacted the following clusters: - GCP US Central (us-central-0) between 13:15 and 13:50 UTC - AWS US East (us-east-0) between 13:43 and 13:55 UTC - AWS US West (us-west-2) between 14:05 and 14:06 UTC The issue has been identified and measurement corrections has been applied.

majorresolvedMar 23, 05:03 PM — Resolved Mar 23, 06:48 PM

Grafana Assistant Unavailable in prod-us-east-0

5 updates

resolvedMar 23, 06:48 PM

This incident has been resolved.

identifiedMar 23, 06:25 PM

The issue has been identified, and we are implementing a fix.

investigatingMar 23, 06:07 PM

The impact extends beyond the TOS check. Assistant is completely unavailable in the impacted region.

investigatingMar 23, 06:01 PM

We are continuing to investigate this issue.

investigatingMar 23, 05:03 PM

We are aware of an issue currently impacting Grafana Assistant. Impacted users are met with a request to accept the TOS, however the plugin is failing upon accepting. Our engineering are currently investigating this issue.

majorresolvedMar 20, 03:00 PM — Resolved Mar 20, 03:41 PM

Authentication API Database Down in prod-eu-west-2 and prod-eu-west-4

3 updates

resolvedMar 20, 03:41 PM

This incident has been resolved.

investigatingMar 20, 03:08 PM

We have observed impact in prod-eu-west-4 as well.

investigatingMar 20, 03:00 PM

We are currently investigating an issue impacting the main database for Authentication API's in the prod-eu-west-2 region. Writes are currently failing, but reads are operational.

majorresolvedMar 19, 04:46 PM — Resolved Mar 19, 06:44 PM

Various Datasource Issues

5 updates

resolvedMar 19, 06:44 PM

This incident has been resolved.

monitoringMar 19, 05:56 PM

We are continuing to monitor for any further issues.

monitoringMar 19, 05:56 PM

We have observed recovery for the Cloudwatch Datasource. We are now seeing failures for the following Datasources: Aurora Opensearch X-Ray Timestream Redshift Sitewise A fix for the above is being rolled out now, and we will monitor progress. We will also change the name of this incident from "Cloudwatch Datasource Issues" to "Various Datasource Issues" to more accurately reflect impact.

monitoringMar 19, 05:13 PM

We have identified the issue, and are rolling out the fix. We are already seeing improvements and will continue to monitor progress.

investigatingMar 19, 04:46 PM

We are currently investigating an issue impacting the CloudWatch Datasource causing failures.

majorresolvedMar 19, 11:17 AM — Resolved Mar 19, 06:11 PM

Degraded performance of Grafana Cloud k6 test runs

2 updates

resolvedMar 19, 06:11 PM

Our engineering team has deployed a fix and we continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

investigatingMar 19, 11:17 AM

Some customers are seeing degraded performance and errors from certain v6 API endpoints. We are investigating the issue.

minorresolvedMar 13, 10:28 AM — Resolved Mar 18, 07:13 AM

Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

3 updates

resolvedMar 18, 07:13 AM

We have been observing stability for a period of time and will mark the incident as resolved at this time.

investigatingMar 13, 09:22 PM

We are continuing to investigate this issue with our CSP, and will provide updates as they become available.

investigatingMar 13, 10:28 AM

We are seeing issues on the write path for Loki in cluster Azure Netherlands (eu-west-3). Impact will reflect in degradation of logs ingestion on that cluster. Our engineering team is already working on restoring the service.

majorresolvedMar 11, 05:10 PM — Resolved Mar 13, 06:15 PM

Rule Evaluation Outage in prod-us-west-0

3 updates

resolvedMar 13, 06:15 PM

This incident has been resolved.

monitoringMar 11, 06:02 PM

A fix has been implemented and we are monitoring the results.

investigatingMar 11, 05:10 PM

We are currently investigating an issue impacting rule evaluation for a subset of customers in the prod-us-west-0 region. We will provide updates as they become available.

majorresolvedMar 13, 07:41 AM — Resolved Mar 13, 06:11 PM

Increased number of Aborted-by-Systems with a k6 binary building errors

4 updates

resolvedMar 13, 06:11 PM

This incident has been resolved.

monitoringMar 13, 12:49 PM

A fix has been implemented and we are monitoring the results.

identifiedMar 13, 08:45 AM

The issue has been identified and a fix is being implemented.

investigatingMar 13, 07:41 AM

We are seeing an increased number of Aborted-by-Systems with a k6 binary building error. We are investigating the issue. The first occurrence of this happened back on March 9, has now been identified as a blocking issue for some customers.

minorresolvedMar 11, 08:31 AM — Resolved Mar 12, 01:18 PM

Grafana Cloud Logs - Write degradation in Azure Netherlands (eu-west-3)

3 updates

resolvedMar 12, 01:18 PM

This incident has been resolved.

investigatingMar 11, 09:13 AM

We are also reporting impact to Faro performance in the same region. We are continuing to investigate this issue.

investigatingMar 11, 08:31 AM

majorresolvedMar 10, 06:00 PM — Resolved Mar 11, 09:48 PM

Some Write Failures in prod-eu-west-3.

5 updates

resolvedMar 11, 09:48 PM

This incident has been resolved.

monitoringMar 11, 03:51 PM

Things have been stable, and we have a potential mitigation should this issue arise again. We are monitoring the issue in the meantime.

identifiedMar 11, 01:35 AM

There are ongoing intermittent elevated transient write failures. We will continue to provide additional updates as more information becomes available.

monitoringMar 10, 06:42 PM

A fix has been implemented, and we are monitoring.

investigatingMar 10, 06:00 PM

We are currently investigating an issue impacting a subset of users in the prod-eu-west-3 region. Impacted users are experiencing elevated transit write failures, with no degradation to the read path.

minorresolvedMar 9, 06:03 PM — Resolved Mar 10, 09:17 PM

Metrics write path outage in prod-us-central-0 and prod-us-central-5

2 updates

resolvedMar 10, 09:17 PM

This incident has been resolved.

monitoringMar 9, 06:03 PM

From 15:30 to 15:45 UTC and from 16:53 to 17:03 UTC, the prod-us-central-0 and prod-us-central-5 regions saw elevated latency and error rates on the write path. We're monitoring now.

minorresolvedMar 9, 02:20 PM — Resolved Mar 10, 08:54 PM

Fleet Managment Elevated Rate of Errors

3 updates

resolvedMar 10, 08:54 PM

This incident has been resolved.

investigatingMar 10, 06:11 PM

Our engineering team continues to work towards a resolution for this issue.

investigatingMar 9, 02:20 PM

Some users in prod-us-central-0 may be seeing elevated rate of errors when fetching configurations. Our engineers are currently investigating this issue.

minorresolvedMar 10, 03:26 PM — Resolved Mar 10, 08:39 PM

Service degradation on Logs Read path in AWS US West (us-west-0)

2 updates

resolvedMar 10, 08:39 PM

This incident has been resolved.

identifiedMar 10, 03:26 PM

There has been a reoccurrence o the issues on the Read path of Loki services on AWS US West since yesterday 9th around ~17:15UTC. The issue has been identified, and resolutions steps has been taken to restore full service. We are currently monitoring the service status. The impact of this includes timeouts and 5xx errors when query logs for customers on this cluster.

majorresolvedMar 10, 06:06 PM — Resolved Mar 10, 07:17 PM

Various Issues with HG Pages

2 updates

resolvedMar 10, 07:17 PM

This incident has been resolved.

investigatingMar 10, 06:06 PM

We are noticing issues with various HG pages. Our engineering team is actively looking into it.

noneresolvedMar 7, 08:07 PM — Resolved Mar 9, 08:59 AM

Outage for prod-eu-central-0 due to AWS S3 outage.

5 updates

resolvedMar 9, 08:59 AM

This incident has been resolved.

monitoringMar 8, 11:30 AM

Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.

investigatingMar 7, 08:10 PM

Since about 20:03 UTC we have seen AWS S3 recover and also our services are recovering, we are monitoring.

investigatingMar 7, 08:10 PM

We are continuing to investigate this issue.

investigatingMar 7, 08:07 PM

We are seeing elevated errors rate and outages across many of our services in prod-eu-central-0, due to an on-going AWS S3 outage in that region.

minorresolvedMar 8, 02:17 PM — Resolved Mar 8, 08:31 PM

Service degradation on Logs Read path in AWS US West (us-west-0)

3 updates

resolvedMar 8, 08:31 PM

We continue to observe a continued period of stability since 19:40 UTC. At this time, we are considering this issue resolved

monitoringMar 8, 06:29 PM

Since 16:35 UTC we have experienced stability and services are recovering. We are actively monitoring and working to fully stabilize

investigatingMar 8, 02:17 PM

Our engineering team is investigating issues on the read path of Loki services on AWS US West since today aroun ~13:25UTC. These issues can cause timeouts and 5xx errors when query logs for customers on the cluster. The team is currently working to restore the service.

February 2026(1 incident)

minorresolvedFeb 25, 07:54 PM — Resolved Mar 17, 06:22 PM

Grafana Cloud Metrics - Intermittent Write Latency in prod-us-central, prod-us-central-5, and prod-eu-west-0

8 updates

resolvedMar 17, 06:22 PM

This incident is now resolved. During the incident the Cloud Metrics platform experienced intermittent latency spikes communicating with a backend cloud service in the prod-us-central-0 and prod-us-central-5 regions. During the incident the internal CSP-facing issue was escalated to a P1. After determining the scope of the latency spikes was limited to only one availability zone, the team mitigated the situation by migrating all write traffic from to the single nearly unaffected availability zone. As the CSP service team attempted to remedy the situation, the situation became worse and began affecting the previously unaffected zone. Given this, another mitigation path was needed. Changing the connection strategy employed by Cloud Metrics to a different method was deployed to all environments, stabilizing the write path once again as we found the different connection method was more reliable and not affected by these increases in latency. We have migrated all tenants back to multi-zone write paths and are happy with and confident in the current method of connectivity to the backend cloud service, which is the one we migrated to during the course of the incident. We have no immediate plans to use the previous problematic connectivity method for the foreseeable future.

monitoringMar 6, 09:44 PM

We are rolling out a mitigation across the environments in these regions, and preemptively where possible to ensure it doesn’t spread elsewhere.

monitoringMar 6, 08:53 PM

We have seen an increase in latency in our cloud providers services, and are rolling out a change to mitigate the issue. We are monitoring.

monitoringMar 5, 10:22 PM

We are continuing to investigate this issue alongside the CSP, and have taken steps to escalate through the appropriate channels. The mitigation in place continues to work as expected, and any notable updates will continue to be shared here for tracking.

monitoringFeb 27, 10:05 PM

We are continuing to investigate this issue alongside the CSP. Any notable updates will continue to be shared here for tracking.

monitoringFeb 27, 02:55 PM

We've implemented mitigation in place and are continuing to monitoring and investigating this issue.

investigatingFeb 26, 04:23 PM

We have begun rolling out mitigation steps to reduce write latency in the prod-us-central-0 and prod-us-central-5 regions. While these measures are expected to improve performance, we are continuing to investigate the underlying root cause of the issue. We will provide additional updates as more information becomes available.

investigatingFeb 25, 07:54 PM

Since February 19, we have been investigating an intermittent issue causing increased write latency in the prod-us-central-0 and prod-us-central-5 regions. The issue does not affect all traffic but may result in delayed write operations for some customers. Our engineering team is actively working to identify the root cause and stabilize performance. We will share additional updates as progress is made.

📡 Tired of checking Grafana Cloud status manually?

Better Stack monitors uptime every 30 seconds and alerts you instantly when Grafana Cloud goes down.

Start Free Monitoring →