G

Grafana Cloud Outage History

Past incidents and downtime events

Complete history of Grafana Cloud outages, incidents, and service disruptions. Showing 50 most recent incidents.

June 2026(17 incidents)

majorresolvedJun 18, 05:01 PM — Resolved Jun 18, 06:50 PM

Issues with actions in the Grafana IRM mobile app

3 updates
resolvedJun 18, 06:50 PM

This incident has been resolved. Thank you for your patience.

monitoringJun 18, 06:09 PM

We've verified a fix in our staging environment to restore functionality to the mobile app. The fix is currently being deployed to production. Thanks for your patience as we continue to roll this out and monitor the resolution.

identifiedJun 18, 05:01 PM

We're noticing an uptick in users being unable to respond to actions on the mobile app (acknowledging and silencing alerts, for example). Users working in the web UI should not be affected. Ingestion and notification delivery are working as expected. We have a fix in place and are in the process of deploying.

criticalresolvedJun 18, 11:25 AM — Resolved Jun 18, 04:21 PM

Degraded k6 cloud UI performance

5 updates
resolvedJun 18, 04:21 PM

This incident has been resolved. Thank you for your patience.

monitoringJun 18, 02:04 PM

We are continuing to monitor for any further issues.

monitoringJun 18, 01:15 PM

The root cause of the issue has been identified and a fix has been successfully deployed. We are observing widespread improvements across all systems. Our team is currently monitoring the environment to ensure performance remains stable.

investigatingJun 18, 11:45 AM

We are continuing to investigate this issue.

investigatingJun 18, 11:25 AM

We’re currently investigating an issue resulting in degraded k6 cloud UI performance and API response time. Our team is actively working to rectify this issue.

majormonitoringJun 18, 03:18 PM

Potential Issues Loading Grafana for Users in India

3 updates
monitoringJun 19, 02:33 AM

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr We are continuing to work with our CSP on this investigation. Impacted users may receive intermittent error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in.

monitoringJun 18, 04:57 PM

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr Impacted users may receive intermittent error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in. We continue to work with our CSP on this investigation.

investigatingJun 18, 03:18 PM

Due to the linked GCP outage below, users located in India may have trouble loading parts of Grafana. https://status.cloud.google.com/incidents/5fGQt4VbkDnr3Yp8PXPr Impacted users may receive error messages such as "Error Loading" or "Failed to load Assets". To be clear, it does not matter the region the stack is located, but the geography where the user is physically in. We are currently investigating this issue from our end, and will provide updates as they are available.

majorresolvedJun 17, 08:17 PM — Resolved Jun 18, 02:08 PM

Loki data source-managed alert rules not visible in the Grafana Cloud Alerting UI

6 updates
resolvedJun 18, 02:08 PM

This incident has been resolved.

monitoringJun 18, 08:04 AM

A fix has been implemented and we are monitoring the results.

identifiedJun 17, 11:22 PM

We are continuing to deploy the fix and monitor recovery efforts. As part of the rollout, we identified an issue that required adjustments to our deployment plan, which has extended the timeline for mitigation. Work remains actively underway, and we will share additional updates as progress continues.

identifiedJun 17, 09:43 PM

Deployment of the fix is still in progress. We are continuing to monitor the rollout and validate recovery across affected systems. We will share further updates as they become available.

identifiedJun 17, 08:55 PM

Our Engineering Team has implemented a fix which is now being rolled out. We will continue to monitor the situation and update as soon as we have more information.

identifiedJun 17, 08:17 PM

We have identified an issue where alert rules and alerts managed directly in a Loki data source (data source-managed alerting) are not displayed in the Grafana Cloud Alerting UI. Rules created via Prometheus/Mimir data sources and Grafana-managed alert rules are not affected. Impact is limited to visibility and management in the UI. Affected alert rules continue to evaluate and send notifications normally — there is no impact to alert delivery. Workaround: Loki alert rules can still be viewed and managed directly through the Loki ruler API (for example, using cortextool against /loki/api/v1/rules). A fix has been identified and is in progress. We will provide a further update once it has been rolled out.

minorresolvedJun 18, 08:48 AM — Resolved Jun 18, 10:13 AM

Frontend Observability - Suspected commit feature not working as expected

2 updates
resolvedJun 18, 10:13 AM

A fix has been deployed and the issue after monitoring as been fixed.

investigatingJun 18, 08:48 AM

We’re currently investigating an issue affecting Frontend Observability product. The "Suspected commit" feature is not currently working as expected. Ingestion and querying is unaffected by this. Our team has identified the cause and is actively working on a fix. Thank you for your patience.

majorresolvedJun 7, 01:39 AM — Resolved Jun 12, 07:52 PM

Brief Rule Evaluation Failures in prod-eu-west-3

7 updates
resolvedJun 12, 07:52 PM

This incident has been resolved. Thank you for your patience.

monitoringJun 8, 09:36 PM

The incident has been mitigated, and services are operating normally. We continue to monitor the service to ensure full stability.

monitoringJun 7, 11:00 AM

The incident has been mitigated, and services are operating normally. We are currently monitor the service to ensure full stability.

investigatingJun 7, 06:00 AM

We’re making ongoing progress on the investigation alongside our upstream provider.

investigatingJun 7, 03:50 AM

We are continuing to investigate this issue.

investigatingJun 7, 02:46 AM

Intermittent spikes in rule evaluations continuing.

investigatingJun 7, 01:39 AM

From 00:20:00 to 00:27:00 and again 00:32:00 to 00:38:00 there were brief spikes in rule evaluation failures. Engineers are investigating.

noneresolvedJun 12, 06:30 PM — Resolved Jun 12, 06:30 PM

Brief Loki Prod-012-eu-west-2 Disruption

1 update
resolvedJun 12, 09:54 PM

Our team had discovered a read issue around 19:35-20:08 UTC. Impact at the time would have provided errors similar to context deadline exceeded (DatasourceError response). This has since been resolved, and should not have caused any data loss, only a short query disruption.

majorresolvedJun 10, 10:49 AM — Resolved Jun 11, 05:36 AM

Grafana Dashboards page not displaying when set to ‘View by Folders’

2 updates
resolvedJun 11, 05:36 AM

This incident has been resolved.

investigatingJun 10, 10:49 AM

We’re currently investigating an issue affecting The Grafana Dashboards page. When set to view by folders, is currently experiencing an issue where no dashboards are shown. Our team is working on fixing the problem. In the meantime, switching to ‘View as list’ allows access to dashboards as usual”.

majorresolvedJun 9, 11:11 PM — Resolved Jun 10, 01:11 PM

Investigating Issues with Data Source-Managed Alerting

3 updates
resolvedJun 10, 01:11 PM

We continue to observe a continued period of recovery. At this time, we are considering this issue resolved. No further updates.

monitoringJun 10, 10:57 AM

Our team has implemented a fix and we are currently monitoring the results of this.

investigatingJun 9, 11:11 PM

We are currently investigating an issue affecting data source-managed alerting management functionality in Grafana Cloud. Customers may experience problems viewing, creating, updating, or managing alerts through Grafana when using data source-managed alerting. This issue is limited to alert management functionality within Grafana. Alert evaluation and backend alerting services continue to operate normally. Direct alerting APIs for Mimir and Loki remain fully operational and are unaffected. Grafana-managed alerting is not impacted. We identified this issue at approximately 20:45 UTC and are actively working on a resolution. We will provide additional updates as more information becomes available. Workaround: Customers can continue to use the direct Mimir and Loki alerting APIs while we work to restore normal functionality.

minorresolvedJun 8, 10:45 AM — Resolved Jun 8, 08:30 PM

IRM Degraded Performance

8 updates
resolvedJun 8, 08:30 PM

This incident has been resolved.

monitoringJun 8, 05:11 PM

We've released a fix to the IRM app that should restore service for affected customers with issues related to labels. Thanks for your patience while investigating. We're continuing to monitor as we confirm the resolution in place.

identifiedJun 8, 03:21 PM

We are continuing to work on a fix for this. To further clarify, this issue is not about accessing IRM or alert ingestion/notification/delivery, but rather with handling labels.

identifiedJun 8, 01:14 PM

The degraded performance is about labels, and we have seen this degradation in more regions.

identifiedJun 8, 12:26 PM

We are continuing to work on a fix for this issue.

identifiedJun 8, 11:15 AM

The issue has been identified and a fix is being implemented.

investigatingJun 8, 10:47 AM

We are continuing to investigate this issue.

investigatingJun 8, 10:45 AM

We are experiencing access issues in IRM as there are elevated 500 API responses in prod-us-central-0.

criticalresolvedJun 5, 04:11 PM — Resolved Jun 5, 09:35 PM

Permissions Issues with IRM

8 updates
resolvedJun 5, 09:35 PM

This incident has been resolved.

monitoringJun 5, 08:10 PM

Continuing to monitor progress. Most customers affected should have all services restored, with a few remaining customers receiving updates as the rollout finishes out. Thanks again for your patience.

monitoringJun 5, 07:34 PM

A fix has been released to prod and rolling out across the fleet for IRM, restoring access to affected customers. Thanks for your patience through this work. We're continuing to monitor to confirm we've returned to a steady state.

monitoringJun 5, 06:40 PM

We've identified an earlier regression in one of our recent code changes that was affecting resolution of our previous fix. We're deploying this change now and applying a hot fix in the interim to restore access quickly

monitoringJun 5, 06:00 PM

A fix is being deployed now, and we are monitoring the progress.

identifiedJun 5, 05:14 PM

We've identified an issue with RBAC, and are working on a fix to restore permission services for those affected.

investigatingJun 5, 04:52 PM

Our engineering team is still investigating this issue. We do not have any new information to share at this time, but will continue to provide timely updates.

investigatingJun 5, 04:11 PM

We are currently investigating an issue impacting permissions for IRM. As a result, users are not currently getting paged. We will provide updates as they become available.

majorresolvedJun 5, 02:02 PM — Resolved Jun 5, 03:01 PM

Silences not Working as Expected

2 updates
resolvedJun 5, 03:01 PM

This incident has been resolved.

identifiedJun 5, 02:02 PM

We have identified an issue causing Silences to not work as expected in the Cloud (Mimir) Alertmanager. Grafana Alertmanager is working ok, this is only affecting Data source-managed alerts.

majorresolvedJun 4, 04:44 PM — Resolved Jun 4, 06:28 PM

Grafana Assistant Skills Page Blank

4 updates
resolvedJun 4, 06:28 PM

This incident has been resolved.

identifiedJun 4, 05:00 PM

The issue has been identified, and we are working on a fix.

investigatingJun 4, 04:56 PM

We are continuing to investigate this issue.

investigatingJun 4, 04:44 PM

We are currently investigating an issue affecting the Skills page of Grafana Assistant. Impacted deployments will encounter a blank screen when attempting to access this page. At this time, we have observed partial impact in the us-east-0 and us-central-0 regions, and will provide an update here if the scope of impact expands.

minorresolvedJun 3, 08:40 PM — Resolved Jun 4, 05:47 AM

K6 Test Runs Degraded Performance

3 updates
resolvedJun 4, 05:47 AM

This incident has been resolved.

monitoringJun 3, 09:23 PM

We have applied a fix, and are monitoring the results.

investigatingJun 3, 08:40 PM

We are currently investigating an issue causing k6 test runs to take longer than expected to complete, or to time out within Grafana Cloud.

majorresolvedJun 3, 11:38 AM — Resolved Jun 3, 06:22 PM

Synthetic Scripted/Browser checks failure

3 updates
resolvedJun 3, 06:22 PM

This incident has been resolved.

identifiedJun 3, 05:07 PM

We are in the process of deploying a fix for this issue.

investigatingJun 3, 11:38 AM

We’re currently investigating an issue affecting Synthetic Monitoring where updates for Scripted/Browser checks might fail. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedJun 2, 11:15 PM — Resolved Jun 3, 07:46 AM

tempo prod-25 write-path-down

2 updates
resolvedJun 3, 07:46 AM

This incident has been resolved.

identifiedJun 2, 11:15 PM

Between 21:20 and 22:40 UTC, writes to tempo-prod-25 failed due to an outage. tempo-prod-24 was also affected during an overlapping window from 22:32 to 22:40 UTC."

minorresolvedJun 1, 07:53 PM — Resolved Jun 1, 08:26 PM

Alert manager unavailable in prod-us-central-0

3 updates
resolvedJun 1, 08:26 PM

This incident has been resolved.

monitoringJun 1, 08:02 PM

A fix has been implemented and we are monitoring the results.

identifiedJun 1, 07:53 PM

Starting at 18:30 UTC, we noticed alert manager unavailability limited to prod-us-central-0 which affects grafana-managed and datasource-managed alerting, causing disruption to updating alertmanager config and limited disruption to alert sending. We have identified the cause and are in the process of remediation.

May 2026(12 incidents)

majorresolvedMay 29, 09:03 AM — Resolved May 29, 10:59 AM

Grafana Loki Log Query Issues

3 updates
resolvedMay 29, 10:59 AM

This incident has been resolved.

monitoringMay 29, 10:28 AM

We have identified the cause of this incident and a fix has been applied. Normal functions are returning. We are currently monitoring the recovery process.

investigatingMay 29, 09:03 AM

We’re currently investigating an issue affecting Loki queries in Grafana. We have had reports from customers showing the logs are not loading or showing missing logs. Our team is actively working to identify the cause. Thank you for your patience.

majorresolvedMay 27, 08:22 PM — Resolved May 27, 10:59 PM

Prometheus Datasource Errors/Outage in prod-us-east-0

4 updates
resolvedMay 27, 10:59 PM

This incident has been resolved. Thank you for your patience.

investigatingMay 27, 09:57 PM

We are seeing recovery across affected Prometheus datasources, and error rates have significantly improved. The service is recovering without any required customer action, and our team continues to monitor stability while we investigate the underlying cause. We’ll provide another update as we learn more.

investigatingMay 27, 09:36 PM

We continue to investigate an issue affecting Prometheus datasources causing intermittent timeouts and unexpected errors, primarily impacting alert rule evaluations. Our team is actively working to identify the cause. Thank you for your patience.

investigatingMay 27, 08:22 PM

We’re currently investigating an issue affecting Prometheus datasources causing 500 internal or Unexpected errors. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedMay 18, 08:24 AM — Resolved May 18, 03:42 PM

Grafana K6 metrics processing and test runs degradation

4 updates
resolvedMay 18, 03:42 PM

This incident has been resolved.

monitoringMay 18, 02:30 PM

We've stabilized the system and test runs no longer result in timeout. There is a small delay (a few minutes) in processing metrics at the end of the test run, but most users shouldn't be too negatively impacted by that. We expected the delay/lag to also resolve within the next 30-60 minutes.

investigatingMay 18, 10:27 AM

We have identified that test runs are getting timed out as a result of the issue This issue first occurred on May 05/15/2026 at 8:00PM UTC.

investigatingMay 18, 08:24 AM

We’re currently investigating an issue that is resulting in degraded performance in metrics processing and test run metrics may take longer than usual to show up. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedMay 13, 08:50 AM — Resolved May 14, 07:06 AM

Intermittent Errors and High latency Writing to Cloud Metrics, Cloud Logs and Cloud Traces

7 updates
resolvedMay 14, 07:06 AM

We continue to observe an extended period of recovery and we're marking the incident as resolved at this point in time.

monitoringMay 13, 09:10 PM

We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.

monitoringMay 13, 03:41 PM

We continue to see signs of recovery and improved stability across impacted services. Our teams continue to closely monitor the situation while working with the cloud provider.

monitoringMay 13, 01:37 PM

We are seeing signs of recovery and improved stability across impacted services over the past hour. Our teams continue to closely monitor the situation while working with the cloud provider.

investigatingMay 13, 10:25 AM

We have identified expanded impact affecting Grafana Cloud Logs and Grafana Cloud Traces in addition to Cloud Metrics, causing intermittent errors and increased latency when writing data. Our teams continue working on a fix and investigating the issue with the cloud provider’s support team.

investigatingMay 13, 10:01 AM

We’re continuing to investigate the issue causing intermittent errors and high latency when writing to Cloud Metrics. We are in contact with the cloud provider’s support team, and they are investigating the issue alongside us.

investigatingMay 13, 08:50 AM

We’re currently investigating an issue causing intermittent errors and high latency when writing to Cloud Metrics. Our team is actively working to identify the cause. Thank you for your patience.

majorresolvedMay 11, 09:38 PM — Resolved May 12, 04:35 PM

"Failed to Load Dashboard" Errors

5 updates
resolvedMay 12, 04:35 PM

This incident has been resolved. Thank you for your patience.

identifiedMay 12, 02:13 PM

The fix is currently being rolled out to all impacted environments.

identifiedMay 12, 11:11 AM

Our teams continue working on a fix for this issue. We do not have additional information to share at this time, but we will continue to provide updates as progress is made.

identifiedMay 12, 08:58 AM

We are continuing to work on a fix for this issue. While we do not have additional updates to share at this time, our teams remain actively engaged and we will provide further updates as soon as they become available.

identifiedMay 11, 09:38 PM

Customers on Grafana Cloud may see an error on dashboard panels with "Failed to load dashboard ... json unmarshal number ...". We have identified the issue and are working to deploy out the fix.

majorresolvedMay 11, 08:49 PM — Resolved May 11, 10:40 PM

SSL/TLS Connectivity Issues

2 updates
resolvedMay 11, 10:40 PM

This incident has been resolved. Thank you for your patience.

investigatingMay 11, 08:49 PM

We are currently investigating reports of service disruption affecting a subset of customers. Customers may experience intermittent connectivity issues, degraded performance, or SSL/TLS certificate validation errors when accessing affected services. Our engineering teams are actively working to identify the scope of impact and restore full functionality as quickly as possible. We will continue to provide updates as more information becomes available.

minorresolvedMay 8, 09:16 PM — Resolved May 8, 10:30 PM

Cloud Metrics -High Write Latency and Errors in prod-us-central-7

2 updates
resolvedMay 8, 10:30 PM

We have continued to observe stability. This incident is now being considered as resolved. Thank you for your patience.

monitoringMay 8, 09:16 PM

From approximately 20:40-21:00 UTc, we experienced an issue affecting Grafana Cloud Metrics in prod-us-central-7. Affected users may have experienced high latency and/or errors during ingestion and rule evaluation. Our team has identified the cause and mitigated. We are currently monitoring for long-term stability.

criticalresolvedMay 7, 07:18 AM — Resolved May 7, 07:56 AM

Metrics read errors in prod-ap-south-1 region

3 updates
resolvedMay 7, 07:56 AM

At this time, we have confirmed that the query errors have gone and we are considering this issue resolved.

monitoringMay 7, 07:53 AM

Engineering has released a fix and as of 07:50 UTC, customers should no longer experience errors when querying metrics. We will continue to monitor for recurrence and provide updates accordingly.

investigatingMay 7, 07:18 AM

From approximately 06:24 UTC, we were alerted to an issue with read errors in mimir-prod-43. Users with instances hosted in the prod-ap-south-1 region experiencing this issue may encounter an error message when querying metrics. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

majorresolvedMay 6, 05:30 AM — Resolved May 6, 05:30 AM

Hardware failure on CSP within prod-us-west-0

1 update
resolvedMay 8, 11:04 AM

We observed an underlying hardware failure on our CSP which triggered an automatic live VM migration. The situation caused a degradation in write performance for Grafana Cloud Metrics on prod-us-west-0 between 05:26 UTC and 05:43 UTC

minorresolvedMay 5, 04:11 PM — Resolved May 5, 08:13 PM

Elevated Error Rate of Browser Checks in PoP Oregon

4 updates
resolvedMay 5, 08:13 PM

This incident has been resolved. Thank you for your patience.

monitoringMay 5, 07:44 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

identifiedMay 5, 06:13 PM

We’ve identified the cause of the issue impacting browser checks. Our team is currently implementing a fix.

investigatingMay 5, 04:11 PM

We’re currently investigating an issue affecting browser checks in the PoP Oregon region. Our team is actively working to identify the cause. Thank you for your patience.

majorresolvedMay 4, 10:58 PM — Resolved May 5, 02:09 AM

k6 Partial Outage

4 updates
resolvedMay 5, 02:09 AM

This incident has been resolved. Thank you for your patience.

monitoringMay 5, 12:04 AM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingMay 4, 11:23 PM

After further investigation, this issue may also be affecting Synthetic Monitoring. We continue to identify the cause and will update as soon as we have more information.

investigatingMay 4, 10:58 PM

We’re currently investigating an issue affecting k6. Our team is actively working to identify the cause. Thank you for your patience.

majorresolvedMay 1, 09:14 AM — Resolved May 1, 10:27 AM

Ingestion Errors for AWS Cloud Provider Observability Metric Streams in prod-us-central-7

4 updates
resolvedMay 1, 10:27 AM

This incident has been resolved.

monitoringMay 1, 09:43 AM

A fix has been implemented and we are monitoring the results.

investigatingMay 1, 09:42 AM

We are continuing to investigate this issue.

investigatingMay 1, 09:14 AM

We are investigating an issue with ingesting Metrics for AWS Cloud Provider Observability with Metric Streams. Users experiencing this issue may encounter ingestion errors in the "prod-us-central-7" region only starting from ~06:30UTC. Engineering is actively engaged and assessing the issue. We will provide updates accordingly.

April 2026(19 incidents)

minorresolvedApr 28, 09:20 AM — Resolved Apr 30, 03:11 PM

Gateway Slowness Detected in Prod (US-East-1)

2 updates
resolvedApr 30, 03:11 PM

After further review, this was a false alarm and should not have affected any users. This incident has been resolved. Thank you for your patience.

investigatingApr 28, 09:20 AM

Successful requests have dropped, users may not be able to access their instances.. The issue is under investigation.

minorresolvedApr 28, 06:46 PM — Resolved Apr 29, 01:37 PM

Investigating Issues Saving SQL Datasource Credentials

3 updates
resolvedApr 29, 01:37 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 28, 06:59 PM

We’ve identified the cause of the issue impacting SQL datasources. Our team is currently implementing a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingApr 28, 06:46 PM

We are currently investigating reports of issues affecting SQL-based data sources where users are unable to save credentials. This appears to impact a subset of customers and may be occurring across multiple regions. We are actively working to determine the scope and root cause. We will provide updates as more information becomes available.

noneresolvedApr 29, 12:00 PM — Resolved Apr 29, 12:00 PM

Performance Testing – Degraded Service (Resolved)

1 update
resolvedApr 29, 01:51 PM

We experienced degraded performance affecting Performance Testing from 13:10 UTC to 13:20 UTC. During this time, users may not have been able to start new test runs. The issue has been resolved, and the service is now operating normally. We apologize for any disruption this may have caused and appreciate your patience.

minorresolvedApr 29, 10:30 AM — Resolved Apr 29, 10:30 AM

Elevated write latency for AWS Metrics Streaming integration in us-east-3 region.

1 update
resolvedApr 29, 12:57 PM

We were facing an incident with AWS Metrics Streaming integration in us-east-3 region manifesting in elevated ingestion latency. The incident started at around 10:45 UTC and was resolved at around 12:30 UTC. Some tenants could see an elevated write latency, but all requests were being processed and we don't expect any data loss during the time of the incident. The incident is now resolved, but we keep monitoring the system's health.

majorresolvedApr 27, 05:08 PM — Resolved Apr 27, 11:24 PM

InfluxDB Datasource - Intermittent Failures

4 updates
resolvedApr 27, 11:24 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 27, 11:13 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

identifiedApr 27, 06:01 PM

We’ve identified the cause of the issue impacting the InfluxDB datasource. Our team is currently implementing a fix.

investigatingApr 27, 05:08 PM

We’re currently investigating an issue affecting the InfluxDB plugin. Some users may see intermittent failures. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedApr 20, 09:12 PM — Resolved Apr 24, 03:04 PM

Restrictions on Alerts & Reports for Grafana Cloud Free/Trial Users

4 updates
resolvedApr 24, 03:04 PM

Grafana Labs has taken steps to safeguard the Grafana Cloud platform against the distribution of unauthorized emails. We have implemented the following changes to new Grafana Cloud Free and Trial accounts, effective immediately: Users who open Grafana Cloud Free and Trial accounts can only send email alerts and reports to users within their Grafana instance. Other email recipients will be rejected. Additionally, the Cloud Alertmanager is no longer available for these instances, requiring all alerting to be configured via the native Grafana Alertmanager. All other integrations for alerting remain functional. If users would like to expand their email capabilities, they can upgrade their Grafana Cloud account to scale their use case, as needed. These changes to alerting and reporting will be applied to all new Grafana Cloud Free and Trial accounts. Existing Grafana Cloud Free and Trial accounts opened before April 20 and all other Grafana Cloud account types are unaffected by these changes.

monitoringApr 22, 03:03 PM

Grafana Labs is implementing measures to safeguard the Grafana Cloud platform against ongoing unauthorized use while preserving the capabilities relied upon by our community. Effective immediately, we have made the following modifications to the platform: Alerting Email alerting has been disabled for new Grafana Cloud Free and Trial accounts; however, all other integrations such as webhooks remain functional. Additionally, Cloud Alertmanager is now disabled for Grafana instances in these accounts, requiring all configuration to occur via the native Grafana Alertmanager. Existing Grafana Cloud accounts remain unaffected by these restrictions and all other Grafana Cloud account types are also unaffected. Reporting To prevent the distribution of unauthorized emails, Grafana instances in new Grafana Cloud Free and Trial accounts are limited to sending reports exclusively to users within their Grafana instance. Standard reporting functionality continues unrestricted for existing Grafana Cloud accounts and all other Grafana Cloud account types. We remain committed to further refining platform security to ensure a safe and open environment for our entire user base.

monitoringApr 20, 10:07 PM

We are continuing to monitor for any further issues.

monitoringApr 20, 09:12 PM

Grafana Labs is taking steps to safeguard our Grafana Cloud platform against unauthorized use while maintaining the Grafana Cloud Free and Trial tiers of service our users and the community have come to rely on. As of Monday April 20, alerting and reporting capabilities have been disabled in new Grafana Cloud Free and trial stacks. We are working towards deploying improvements and restoring those functionalities in a way that keeps our platform secure and open for all of our users.

majorresolvedApr 23, 02:26 PM — Resolved Apr 23, 08:01 PM

Cloudwatch Datasource Outage

3 updates
resolvedApr 23, 08:01 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 23, 02:39 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

investigatingApr 23, 02:26 PM

We’re currently investigating an issue affecting Cloudwatch datasources. Our team is actively working to identify the cause. Thank you for your patience.

criticalresolvedApr 20, 02:09 PM — Resolved Apr 20, 02:30 PM

Elevated 429 Errors Impacting Metrics Querying Across Multiple Regions

3 updates
resolvedApr 20, 02:30 PM

This incident has been resolved. Thank you for your patience.

investigatingApr 20, 02:21 PM

The issue is now confirmed to be widespread, affecting Prometheus across all regions. Customers may continue to experience elevated 429 (rate limit) errors, particularly when querying metrics, with failures or inconsistent responses possible. Our engineering team remains fully engaged and is actively working on mitigation and resolution efforts with the highest priority.

investigatingApr 20, 02:09 PM

We are currently experiencing a major incident causing elevated 429 (rate limit) errors across multiple regions, primarily impacting metrics querying. This is a high-priority issue, and our engineering team is actively engaged and working urgently to identify the root cause and restore full service as quickly as possible. Customers may experience widespread failures or delays when querying metrics during this time. We understand the significant impact this may have and will continue to provide updates as more information becomes available.

minorresolvedApr 17, 09:23 PM — Resolved Apr 17, 10:58 PM

Query Caching - Degraded Performance

3 updates
resolvedApr 17, 10:58 PM

This incident has been resolved

monitoringApr 17, 10:09 PM

Currently prod-us-east-0 and prod-eu-west-3 have recovered, and we are continuing to monitor prod-us-central-0 which is in the process of recovery.

investigatingApr 17, 09:23 PM

As of 20:52 UTC, we are currently investigating degraded Query Caching performance in multiple regions. For datasources where query caching is configured, some queries may take longer than usual. Our team is actively working to identify the cause. Thank you for your patience.

minorresolvedApr 16, 12:52 PM — Resolved Apr 16, 02:02 PM

Issues on Stack creation

3 updates
resolvedApr 16, 02:02 PM

This incident has been resolved.

monitoringApr 16, 01:19 PM

The issue is fixed and we are currently monitoring the service.

identifiedApr 16, 12:52 PM

Since today 16th at ~12:11UTC we are seeing issues on stack creation across all our regions. Customers will experience error message when attempting to create a stack. Our engineering team has identified the source of the issue as external to Grafana (provider), and they are tracking its recovery.

minorresolvedApr 15, 04:07 PM — Resolved Apr 15, 04:25 PM

Degraded Ticket Visibility in Support System

2 updates
resolvedApr 15, 04:25 PM

This incident has been resolved and our ticketing system is fully operational. Thank you for your patience.

monitoringApr 15, 04:07 PM

We are currently experiencing an issue with our ticketing system provider that is affecting how tickets appear within our internal support views. We are continuing to receive all new tickets successfully, and no requests are being lost at this time. Our team is actively monitoring the situation and working to ensure all incoming requests are reviewed, including those that may not be immediately visible in standard views. We will provide further updates as we receive more information from our provider. We appreciate your patience.

minorresolvedApr 14, 09:22 AM — Resolved Apr 15, 12:59 PM

K6 Sporadic DNS Issues

4 updates
resolvedApr 15, 12:59 PM

This incident is now resolved. We had intermediary issues with a flaky DNS server that caused random tests to not start properly. Since the DNS server was fixed, we haven't been seeing the issue anymore.

monitoringApr 14, 02:29 PM

Our engineering team has deployed a fix and we are currently monitoring the behaviour of the system until full resolution.

monitoringApr 14, 02:29 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time.

identifiedApr 14, 09:22 AM

We are having sporadic DNS issues that occasionally affect the start of cloud test runs, causing them to abort. We are currently working to resolve. The issue has been occurring since April 9.

noneresolvedApr 14, 11:30 AM — Resolved Apr 14, 11:30 AM

k6 Cloud Service Disruption

1 update
resolvedApr 14, 01:44 PM

Between approximately 12:30 UTC and 13:15 UTC, k6 Cloud experienced a service disruption due to issues introduced in a recent API release. During this time, users were unable to access the k6 Cloud application. The issue was mitigated by reverting the release, and service has since been fully restored.

noneresolvedApr 13, 11:30 AM — Resolved Apr 13, 11:30 AM

Loki write instability in prod-eu-west-2.loki-prod-012

1 update
resolvedApr 14, 12:02 PM

There was a period of write instability yesterday. It was between ~1330 -1730 UTC yesterday.  This was due to a scheduled maintenance.

majorresolvedApr 10, 11:53 PM — Resolved Apr 11, 12:36 AM

Grafana Cloud Logs - Write degradation in us-east-3

3 updates
resolvedApr 11, 12:36 AM

This incident has been resolved.

monitoringApr 11, 12:10 AM

A fix has been implemented and we are monitoring the results.

investigatingApr 10, 11:53 PM

We are seeing issues on the write path for Loki in cluster in us-east-3, and we are actively investigating this issue.

majorresolvedApr 10, 07:42 PM — Resolved Apr 10, 09:02 PM

Tempo Write Outage

3 updates
resolvedApr 10, 09:02 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 10, 07:53 PM

We’ve implemented a fix and are monitoring the results to confirm the issue is fully resolved. Services may start to recover during this time. We’ll update again within an hour.

investigatingApr 10, 07:42 PM

We are currently investigating a write outage affecting prod-us-east-3. The issue began at 18:50 UTC. Users may experience errors, timeouts, or unavailability while we work to identify the cause and restore service.

minorresolvedApr 9, 05:34 PM — Resolved Apr 9, 06:50 PM

K6 Browser Testing/Timeline Not Available

3 updates
resolvedApr 9, 06:50 PM

This incident has been resolved. Thank you for your patience.

identifiedApr 9, 06:39 PM

We’ve identified the cause of the issue impacting k6 browser testing/timeline. Our team is currently implementing a fix. We’ll provide another update in two hours or sooner if the situation changes.

investigatingApr 9, 05:34 PM

We’re currently investigating an issue affecting browser testing. Users running browser tests will not be able to see the browser timeline. Our team is actively working to identify the cause and will share an update within two hours. Thank you for your patience.

minorresolvedApr 8, 05:00 PM — Resolved Apr 8, 05:00 PM

Stability Issues for Some Customers in the prod-gb-south-1 Region.

1 update
resolvedApr 8, 05:00 PM

We had a stability issue for a subset of customers in the prod-gb-south-1 region. The impact was between UTC 15:20-16:30 which impacted roughly 30% of queries and rules evaluations. We've applied mitigations, queries should be back to normal.

minorresolvedApr 7, 03:17 PM — Resolved Apr 7, 08:17 PM

Unable to Edit Notification Policies

4 updates
resolvedApr 7, 08:17 PM

This incident has been resolved. Thank you for your patience.

identifiedApr 7, 06:03 PM

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

identifiedApr 7, 04:52 PM

We’ve identified the cause of the issue impacting notification policies. Our team is currently implementing a fix. We’ll provide another update in 2 hours or sooner if the situation changes.

investigatingApr 7, 03:17 PM

We’re currently investigating an issue affecting notification policies. Our team is actively working to identify the cause and will share an update within 2 hours. Thank you for your patience.

March 2026(2 incidents)

criticalresolvedMar 2, 06:43 AM — Resolved Jun 17, 09:58 AM

Complete outage in prod-me-central-1

15 updates
resolvedJun 17, 09:58 AM

Following our ongoing communications regarding the complete outage in prod-me-central-1, we are now closing this incident. As noted in the latest AWS update, the Middle East (UAE) region (ME-CENTRAL-1) has suffered significant damage and restoration is expected to take several months. We strongly recommend all affected customers migrate workloads to an alternate Grafana Cloud region as soon as possible. If you have not already done so, please follow the migration steps outlined in our previous updates: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/ 3. If your instance remains available and you have not configured your dashboards as code, then you may be able to use `grafanactl` to migrate dashboards https://grafana.com/docs/grafana/latest/as-code/observability-as-code/grafana-cli/grafanacli-workflows/ https://grafana.github.io/grafanactl/ For further details, please refer to the AWS incident communication directly: https://health.aws.amazon.com/health/status Please reach out to our Support team if you need any assistance with the above - https://grafana.com/profile/org#support We will continue to monitor the situation and update the incident once circumstances change.

investigatingMay 27, 05:10 PM

The TLS certificates serving prod-me-central-1 endpoints expire on May 30, 2026. Replacement certificates have been imported, but the ongoing AWS regional incident is preventing them from propagating to all load balancer nodes, so customers may see certificate errors after that date until AWS restores normal operation. We do not have any additional updates to share at this time. Our team is actively monitoring the situation and will provide further information as it becomes available. In the meantime, please continue to refer to the AWS Status Page for the most detailed and up-to-date information.

investigatingMay 21, 11:41 AM

AWS UAE - prod-me-central-1: Public Probe checks might suffer degraded experience. We recommend migrating checks from the UAE probe to the next nearest probe suitable for your use case.

investigatingMay 13, 09:59 PM

We do not have any additional updates to share at this time. Our team is actively monitoring the situation and will provide further information as it becomes available. In the meantime, please continue to refer to the AWS Status Page for the most detailed and up-to-date information.

investigatingApr 20, 03:11 PM

We are continuing to investigate this issue.

investigatingMar 19, 12:13 PM

We have not received any further updates from AWS at this time. However, we are actively monitoring the outage and will provide additional information as it becomes available. Also, please continue to refer to the AWS status page for more detailed updates. https://health.aws.amazon.com/health/status All the guidance previously included about stack migration is still relevant. Please reach out to our Support team if you have any questions.

investigatingMar 4, 10:22 PM

We are actively monitoring the situation, but at this time there are no new updates to share. The next update will be provided once we have more information to share. Please reach out to our Support team if you have any questions.

investigatingMar 4, 10:28 AM

We are continuing to investigate this issue.

investigatingMar 2, 10:18 PM

Please continue to refer to the AWS status page for more detailed updates specific to AWS. https://health.aws.amazon.com/health/status AWS are recommending that affected customers move workloads to alternate regions, and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/ 3. If your instance remains available and you have not configured your dashboards as code, then you may be able to use `grafanactl` to migrate dashboards https://grafana.com/docs/grafana/latest/as-code/observability-as-code/grafana-cli/grafanacli-workflows/ https://grafana.github.io/grafanactl/ We are continuing to work with our CSP at this time, and will provide updates as they are available.

investigatingMar 2, 10:31 AM

AWS are recommending that affected customers move workloads to alternate regions https://health.aws.amazon.com/health/status and we are recommending the same. Customers who are impacted and who cannot wait for a restoration of service are asked to: 1. Create a Grafana Cloud stack in an alternate region 2. Update clients to send telemetry to the new region, if using Grafana Alloy then you can use Fleet Management https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/ 3. If your instance remains available and you have not configured your dashboards as code, then you may be able to use `grafanactl` to migrate dashboards https://grafana.com/docs/grafana/latest/as-code/observability-as-code/grafana-cli/grafanacli-workflows/ https://grafana.github.io/grafanactl/ We will provide updates when we have them, but we do not have an expected resolution time at this point.

investigatingMar 2, 10:04 AM

Customers are recommended to configure a new blank stack in an alternative Grafana Cloud region and to reconfigure their clients (such as Grafana Alloy) to send telemetry to that region, Fleet Management can be used for this purpose https://grafana.com/docs/grafana-cloud/send-data/fleet-management/introduction/

investigatingMar 2, 08:36 AM

We are updating this incident to reflect a complete outage in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

investigatingMar 2, 08:21 AM

We are observing write and read outage errors across all databases (metrics, logs, traces) in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

investigatingMar 2, 08:14 AM

We are observing write and read outage errors across all databases (metrics, logs, traces) in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

investigatingMar 2, 06:43 AM

We are seeing elevated write and read path errors in prod-me-central-1, due to an on-going AWS UAE data center issue. We will provide further updates accordingly.

criticalresolvedMar 25, 02:11 PM — Resolved Apr 23, 08:07 PM

Prometheus writes in prod-eu-west-3 are degraded

10 updates
resolvedApr 23, 08:07 PM

This incident has been resolved. Thank you for your patience.

monitoringApr 20, 03:08 PM

We are continuing to monitor for any further issues.

monitoringApr 14, 08:11 PM

We have deployed mitigation and seen improvement in write failures over the past week. We are still seeing intermittent spikes in latency and continue to monitor.

monitoringApr 8, 08:32 PM

We are still seeing intermittent issues and continue to seek a resolution

monitoringApr 2, 09:38 PM

We are continuing to monitor for any further issues.

monitoringMar 27, 09:05 PM

We are continuing to monitor this through the weekend.

monitoringMar 26, 05:45 PM

We are continuing to monitor the previously impacted environments.

monitoringMar 26, 12:04 PM

A fix has been implemented and we are monitoring the results.

investigatingMar 25, 09:35 PM

We are continuing to investigate this issue.

investigatingMar 25, 02:11 PM

The metric writes issue reported in https://status.grafana.com/incidents/gfshj17lxj5z is still ongoing. Our Engineering team is actively investigating this and we will provide further updates as our investigation progresses.

📡 Tired of checking Grafana Cloud status manually?

Better Stack monitors uptime every 30 seconds and alerts you instantly when Grafana Cloud goes down.

Start Free Monitoring →