Fly.io Outage History

Past incidents and downtime events

Complete history of Fly.io outages, incidents, and service disruptions. Showing 50 most recent incidents.

← Back to Fly.io current status

February 2026(4 incidents)

majorresolvedFeb 3, 03:33 PM — Resolved Feb 3, 03:53 PM

IPv6 Issues in YYZ

3 updates

resolvedFeb 3, 03:53 PM

This incident has been resolved.

monitoringFeb 3, 03:44 PM

A fix has been implemented and we're seeing IPv6 networking return to normal in YYZ. We'll continue to monitor to ensure full recovery.

investigatingFeb 3, 03:33 PM

We are currently investigating degraded IPv6 networking in the YYZ (Toronto) region. Users with machines in this region may see issues connecting to their machines over IPv6. Users with static egress IPs may see issues connecting outbound over IPv6 from this region at this time. IPv4 is not impacted and continues to work normally.

minorresolvedFeb 3, 02:56 AM — Resolved Feb 3, 03:43 AM

Elevated latency and packetloss in North American regions

3 updates

resolvedFeb 3, 03:43 AM

This incident has been resolved.

monitoringFeb 3, 03:26 AM

Network performance issues between North American regions have resolved and we're continuing to monitor.

investigatingFeb 3, 02:56 AM

We are currently investigating intermittent spikes of increased latency and packet loss between North American regions over the past hour. Users may see degraded network performance on traffic in and out of the IAD and SJC regions at this time. We are working with our upstream networking providers to investigate and mitigate these issues.

minorresolvedFeb 1, 08:15 PM — Resolved Feb 1, 09:37 PM

Congestion in CDG and FRA

2 updates

resolvedFeb 1, 09:37 PM

This incident has been resolved.

investigatingFeb 1, 08:15 PM

We are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).

noneresolvedFeb 1, 02:16 AM — Resolved Feb 1, 05:48 AM

Sprites are returning not found or unauthorized when they shouldn't be.

6 updates

resolvedFeb 1, 05:48 AM

This incident has been resolved.

monitoringFeb 1, 05:32 AM

We've been able to restore missing sprites and tokens. We're monitoring for any additional issues.

identifiedFeb 1, 04:52 AM

We're working on a fix to restore missing sprites and tokens.

identifiedFeb 1, 03:02 AM

We identified the source of the problem as an upstream DNS issue Tigris experienced, now resolved. We're currently assessing the impact on Sprites.

investigatingFeb 1, 02:51 AM

We are continuing to investigate this issue.

investigatingFeb 1, 02:16 AM

We're currently investigating this issue.

January 2026(18 incidents)

noneresolvedJan 31, 05:35 PM — Resolved Jan 31, 06:29 PM

Grafana Log Search Display Issue

3 updates

resolvedJan 31, 06:29 PM

This has been resolved. If you are still experiencing any issues, you may need to log out and then back in.

investigatingJan 31, 05:49 PM

No logs are displayed in Grafana Log Search when using the default `*` query. You can try the following workarounds: 1. Replace the default `*`query with `NOT ""` 2. Viewing logs from the “fly app” tab or the “explore” tab, Thank you for your kind understanding as we work through resolving this!

investigatingJan 31, 05:35 PM

No logs are displayed in Grafana Log Search when using the default `*` query. As a temporary workaround, please replace `*` with `NOT ""` query. Thank you for your kind understanding as we work through resolving this!

minorresolvedJan 27, 07:40 PM — Resolved Jan 29, 08:04 PM

Delayed metric reporting in NRT and SIN regions

5 updates

resolvedJan 29, 08:04 PM

This incident has been resolved. All hosts in SIN and NRT are reporting up to date metrics.

identifiedJan 29, 03:15 PM

Currently one host in SIN is still finishing working through it's metrics backlog and is reporting delayed metrics. Other hosts in NRT and SIN are reporting metrics correctly. If needed, users with impacted machines on the remaining host can use `fly machine clone` to create new machines in the region, which should land on a different host.

identifiedJan 28, 03:27 PM

Most hosts in NRT and SIN have completed backfilling their metrics and are up to date in fly-metrics.net. Four hosts are still working through the backlog; machines on those hosts are still reporting delayed metrics at this time.

identifiedJan 28, 02:49 AM

We are continuing to process the metrics backlog in NRT and SIN. Progress is being made, but due to the volume of metrics this may still take some time to fully complete. At this time users with machines on impacted hosts will see metrics beginning to backfill into fly-metrics.net. However many will not be fully caught up yet. This impacts metrics only, the underlying machines continue to work normally.

identifiedJan 27, 07:40 PM

A small number of hosts in the NRT (Tokyo) and SIN (Singapore) are reporting delayed metrics to the hosted Grafana charts at fly-metrics.net. Users with machines on impacted hosts will see delayed or spotty metrics in their Grafana charts. Only metrics for these machines are impacted. The underlying machines continue to receive and serve traffic as usual, and all machine actions(stopping, starting, deploys etc.) continue to work normally. We are processing the backlog of metrics on these hosts, but metrics will be delayed until this is complete.

minorresolvedJan 24, 04:00 PM — Resolved Jan 24, 04:00 PM

Congestion in CDG and FRA

1 update

resolvedJan 24, 11:06 PM

We are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).

minorresolvedJan 21, 03:21 AM — Resolved Jan 21, 05:24 AM

Delays issuing certificates

3 updates

resolvedJan 21, 05:24 AM

This incident has been resolved.

monitoringJan 21, 04:59 AM

We have identified the congestion and released a fix, we'll continue to monitor while the jobs catch up.

investigatingJan 21, 03:21 AM

We are currently investigating possible delays issuing ACME certificates for new hostnames.

minorresolvedJan 20, 10:08 PM — Resolved Jan 20, 10:15 PM

Errors creating new Sprites

3 updates

resolvedJan 20, 10:15 PM

This incident has been resolved.

investigatingJan 20, 10:10 PM

We are continuing to investigate this issue.

investigatingJan 20, 10:08 PM

We're currently investigating an issue that's preventing new Sprites from being created.

noneresolvedJan 19, 02:34 PM — Resolved Jan 19, 08:07 PM

MPG network instability in LAX

3 updates

resolvedJan 19, 08:07 PM

This incident has been resolved.

monitoringJan 19, 03:11 PM

Connections are back to normal. We'll keep monitoring the region.

investigatingJan 19, 02:34 PM

We identified network partitions in the LAX region. We are investigating the problem.

majorresolvedJan 19, 12:17 PM — Resolved Jan 19, 12:29 PM

Machines errors in JNB region

2 updates

resolvedJan 19, 12:29 PM

This incident has been resolved.

identifiedJan 19, 12:17 PM

A bad deploy of an internal service in JNB region may cause Machines API requests for JNB region machines to fail. At this time, it may not be possible to create or update machines in JNB region, but apps continue to run. The deploy is being reverted.

minorresolvedJan 18, 01:08 PM — Resolved Jan 18, 03:12 PM

Delayed app logs

2 updates

resolvedJan 18, 03:12 PM

This incident has been resolved.

monitoringJan 18, 01:08 PM

App logs after 12:00 UTC may be delayed to show up on fly-metrics.net. We are monitoring log insert rate and will update once log insertion is current. Log streaming via `flyctl logs` or NATS is current.

minorresolvedJan 17, 05:12 PM — Resolved Jan 17, 05:46 PM

Network issues in SIN region

2 updates

resolvedJan 17, 05:46 PM

This incident has been resolved.

investigatingJan 17, 05:12 PM

We are investigating network issues in SIN region. Apps running in this region may experience elevated latency or packet loss.

noneresolvedJan 15, 05:52 PM — Resolved Jan 16, 01:35 AM

Elevated Machines errors in ORD

4 updates

resolvedJan 16, 01:35 AM

This incident has been resolved. Machine placement improvements are now deployed in all regions.

monitoringJan 15, 08:19 PM

The fix has been deployed to ORD and we are monitoring results. New Machines will be placed using stricter memory thresholds. Existing Machines on impacted hosts will be migrated to new hosts on start and update. We are also provisioning additional capacity in ORD.

identifiedJan 15, 07:04 PM

We are deploying a fix to enforce stricter memory thresholds for Machine placements. This will steer new and migrated workloads towards hosts with optimal capacity.

investigatingJan 15, 05:52 PM

We are investigating high memory utilization on some hosts in ORD. Customers may notice timeouts or errors in reserving resources when updating or creating new Machines in the region. We are rebalancing workloads in ORD to relieve memory pressure on impacted hosts.

noneresolvedJan 15, 10:51 PM — Resolved Jan 16, 12:43 AM

MPG clusters in SIN experiencing network issues

4 updates

resolvedJan 16, 12:43 AM

Connections normal for all clusters in SIN.

monitoringJan 15, 11:57 PM

Cluster connections are stable. We'll keep monitoring.

monitoringJan 15, 11:50 PM

Network performance in SIN has normalized and MPG clusters are working as expected. We're continuing to monitor to ensure continued stability, but customers should not see an impact on their cluster at this time.

monitoringJan 15, 10:51 PM

Worker machines in SIN are affected by a provider's network outage. Database clusters may present network connectivity issues. We are monitoring the situation.

noneresolvedJan 15, 02:21 AM — Resolved Jan 15, 04:54 PM

MPG instability in LAX

4 updates

resolvedJan 15, 04:54 PM

Clusters are stable.

investigatingJan 15, 03:12 AM

The root cause is a temporary network degradation in LAX; multiple clusters lost contact with the DCS. Connections are being reestablished.

investigatingJan 15, 02:23 AM

We are continuing to investigate this issue.

investigatingJan 15, 02:21 AM

We identified potential connection problems on ~50 clusters in LAX. We are investigating the cause and remediating the problem.

majorresolvedJan 15, 01:55 AM — Resolved Jan 15, 04:18 AM

App logs unavailable

5 updates

resolvedJan 15, 04:18 AM

This incident has been resolved.

monitoringJan 15, 03:47 AM

App log services are functioning normally. We are continuing to monitor.

identifiedJan 15, 02:57 AM

We have identified the issue and applied a mitigation. App logs should now be available again through Grafana. Some logs may be delayed or missing. We are still working to address the root cause.

investigatingJan 15, 02:25 AM

We are continuing to investigate this issue.

investigatingJan 15, 01:55 AM

We are currently investigating an outage with app logs in Grafana. Logs are still available through flyctl and through the dashboard.

noneresolvedJan 14, 01:02 PM — Resolved Jan 14, 01:25 PM

Authentication token issues

2 updates

resolvedJan 14, 01:25 PM

This incident has been resolved.

investigatingJan 14, 01:02 PM

We are investigating intermittent issues with authentication. Apps continue to run, and APIs are accessible with existing tokens, but operations such as creating new tokens may fail.

minorresolvedJan 9, 12:53 PM — Resolved Jan 12, 04:29 AM

Metrics display issue in Mumbai (BOM) region

9 updates

resolvedJan 12, 04:29 AM

This incident has been resolved and all hosts in BOM are accurately reporting metrics.

identifiedJan 11, 06:43 PM

One host in BOM remains reporting delayed metrics as it continues to catch up. All other hosts in BOM are reporting metrics correctly. If needed, users with impacted machines can use `fly machine clone` to create new machines in the region, which should land on a different host.

identifiedJan 11, 05:36 AM

Metrics have returned to normal for most hosts in BOM. Two hosts are still reporting delayed metrics, but are continuing to catch up. Users with impacted machines can use `fly machine clone` to add new machines, which should land on a different host.

identifiedJan 10, 01:28 AM

Metrics have completed backfilling and are up to date on most hosts in the BOM region. Two hosts are still working through the backlog; machines on those two hosts are still reporting delayed metrics at this time.

identifiedJan 9, 08:36 PM

We are continuing to process the metrics backlog in BOM. Progress is being made, but due to the volume of metrics this may still take some time to fully complete. At this time users with machines in BOM should see metrics from the past 12h beginning to backfill into fly-metrics.net. However most will not be fully caught up yet.

identifiedJan 9, 05:59 PM

We are continuing to work through the backlog of metrics in BOM. Metrics will remain unavailable for BOM machines until this is complete.

identifiedJan 9, 03:20 PM

Our metrics cluster is continuing to working through the backlog of metrics in BOM. Metrics for machines running in the BOM region will continue to be unavailable until this is complete.

identifiedJan 9, 02:07 PM

The cause of the issue has been identified and a fix is being implemented. Metrics for machines in the BOM region remain unavailable in fly-metrics.net, however the machines themselves continue to run, start, and stop normally.

investigatingJan 9, 12:53 PM

We are currently investigating issues collecting machine metrics for machines running in the BOM (Mumbai, India) region. Machines in the region continue to run, start, and stop normally, however metrics for these machines are not displaying in fly-metrics.net.

minorresolvedJan 11, 05:12 PM — Resolved Jan 11, 08:44 PM

Elevated GraphQL API Latency and Dashboard Error Rates

6 updates

resolvedJan 11, 08:44 PM

This incident has been resolved. We have seen API latency normalize and remain normal since ~19:00 UTC.

monitoringJan 11, 05:53 PM

API error rates have normalized, however users may still see elevated latency reaching some GraphQL endpoints. Latency continues to trend in the right direction, we continue to monitor for full recovery .

identifiedJan 11, 05:33 PM

We have deployed an initial fix and are seeing improvements. GraphQL error rates and latency remain elevated over the baseline at this time. We are continuing to keep a close eye on recovery.

identifiedJan 11, 05:22 PM

The issue has been identified and a fix is being implemented.

investigatingJan 11, 05:21 PM

We are continuing to investigate elevated latency and error rates on our GraphQL API endpoints. Users may see errors on parts of the platform that use these APIs. This includes Flyctl actions such as deploys, as well as the fly.io dashboard.

investigatingJan 11, 05:12 PM

We are investigating elevated API Latency and Error rates on the platform. Users may see delays or errors creating apps, as well as on some dashboard pages. Machines API actions appear unimpacted at this time

criticalresolvedJan 6, 04:21 AM — Resolved Jan 6, 07:30 PM

Management plane outage

10 updates

resolvedJan 6, 07:30 PM

This incident has been resolved.

monitoringJan 6, 08:59 AM

A fix has been implemented and we are seeing system performance return to normal. Machine API and general platform operations are succeeding again, although users may see slightly elevated error rates as things finish stabilizing. We are continuing to closely monitor the platform to ensure full recovery and stability.

identifiedJan 6, 08:26 AM

We are continuing to work on a fix for this issue.

identifiedJan 6, 08:01 AM

Services are starting to come up. The dashboard should be accessible and deploys and other flyctl based commands should work. Some services may feel sluggish while things heat up.

identifiedJan 6, 07:23 AM

The team is getting closer to a fix. We will provide another update within the next 30 minutes.

identifiedJan 6, 06:48 AM

We are continuing to make progress on a fix for this issue.

identifiedJan 6, 06:01 AM

We are continuing to work on restoring service to the Machines API and other affected platform components..

identifiedJan 6, 05:32 AM

We are continuing to work on deploying a fix. The fly.io dashboard and the machines API continue to be unavailable at this time. Fly Managed Postgres (MPG) clusters continue to run normally, however creating new clusters will fail at this time. Users may also see scheduled backups remain in a running or pending state at this time. These backups will resume as scheduled once the platform level issues are resolved

identifiedJan 6, 04:47 AM

We have identified the cause of the outage and are working on a fix. The fly.io dashboard and the machines API continue to be unavailable at this time. Running machines should continue to stay up and be reachable at this time. However creating/starting/stopping machines, running new deployments, or other operations that rely on the machines API remain unavailable.

investigatingJan 6, 04:21 AM

We are investigating a major outage of our control plane. Apps may continue to run, but it is not currently possible to log in to the dashboard or use the Machines API.

noneresolvedJan 1, 01:09 AM — Resolved Jan 1, 05:52 AM

Network issues impacting EU <-> US traffic

3 updates

resolvedJan 1, 05:52 AM

This incident has been resolved.

monitoringJan 1, 02:01 AM

We're seeing performance on impacted routes return to normal levels. Prior to recovering, we observed intermittent high packet loss for US EU traffic, most acutely from approximately Dec 31 23:35 to 23:45 UTC, and later from Jan 1 00:40 to 01:35.

investigatingJan 1, 01:09 AM

We've detected degraded network performance on some of our upstream network providers, impacting traffic between US and EU regions. We're in contact with these teams as we monitor for recovery.

December 2025(13 incidents)

minorresolvedDec 20, 07:12 PM — Resolved Dec 20, 08:17 PM

Network issues in JNB region

3 updates

resolvedDec 20, 08:17 PM

This incident has been resolved.

monitoringDec 20, 07:58 PM

A fix has been implemented and we are monitoring the results.

investigatingDec 20, 07:12 PM

We are investigating network issues in JNB region. Apps may experience increased latency or packet loss.

minorresolvedDec 17, 01:46 AM — Resolved Dec 17, 02:47 AM

Networking and metrics degradation

4 updates

resolvedDec 17, 02:47 AM

This incident has been resolved.

monitoringDec 17, 02:27 AM

Network performance across Fly.io has returned to normal. This incident primarily impacted machines in the SJC and EWR regions. Metrics are largely caught up, but users may still see a slight delay in reporting as the cluster finishes catching up.

investigatingDec 17, 02:12 AM

A change has been made and metrics on fly-metrics.net are backfilling. Users may still see a slight delay or gaps in new metrics being reported as the backfill completes. We continue to see higher than usual latency and packetloss across the network. We are continuing to investigate.

investigatingDec 17, 01:46 AM

We are currently investigating increased latency and packet loss across multiple regions. Customers may see additional latency on requests at this time. Relatedly, prometheus metrics reported via fly-metrics.net is currently degraded. Users may see delays or gaps in metrics at this time. We are working to address both issues.

majorresolvedDec 15, 03:49 PM — Resolved Dec 15, 04:57 PM

Network issues in SIN

4 updates

resolvedDec 15, 04:57 PM

This incident has been resolved.

monitoringDec 15, 04:13 PM

A fix has been implemented and we are monitoring the results.

identifiedDec 15, 04:02 PM

We have identified an upstream issue and are currently working to restore connectivity. In the mean time, we have temporarily rerouted Anycast traffic away from SIN.

investigatingDec 15, 03:49 PM

We are investigating network issues in SIN. Apps might experience partial outage connecting to some IPs.

majorresolvedDec 11, 09:07 AM — Resolved Dec 12, 06:46 PM

Network connectivity issues in GRU

5 updates

resolvedDec 12, 06:46 PM

This incident has been resolved.

identifiedDec 11, 09:50 PM

We are continuing to see elevated packet loss from a site in our GRU region to other fly regions in the US, via some US ISPs. Impacted apps may see degraded or broken connections. Local GRU traffic is unaffected. We are working with our provider in GRU to improve routing and will update once a fix is in place.

identifiedDec 11, 09:31 PM

identifiedDec 11, 09:34 AM

Some hosts in GRU now have healthy routes to all destinations. We are continuing to work with our upstreams on restoring full network connectivity for all machines. Apps may see degraded or broken connections to some destinations.

identifiedDec 11, 09:07 AM

We're aware of network issues in GRU and are working with our upstream providers to reroute traffic.

minorresolvedDec 11, 07:29 PM — Resolved Dec 11, 09:21 PM

Increased Machines API Errors in YYZ

5 updates

resolvedDec 11, 09:21 PM

This incident has been resolved.

monitoringDec 11, 08:53 PM

A fix has been implemented and we are monitoring the results.

identifiedDec 11, 08:26 PM

We're seeing some recurrence of the issue and are continuing to investigate.

monitoringDec 11, 08:17 PM

A fix has been implemented and we are monitoring the results.

investigatingDec 11, 07:29 PM

We are investigating an increase in Machines API errors in the YYZ region. Machine updates and requests to machines located in this region may take longer than usual.

criticalresolvedDec 8, 09:10 PM — Resolved Dec 8, 09:17 PM

Network outage in SIN region

2 updates

resolvedDec 8, 09:17 PM

This incident has been resolved.

investigatingDec 8, 09:10 PM

We are investigating a network issues in SIN region.

minorresolvedDec 8, 07:59 AM — Resolved Dec 8, 06:27 PM

MPG management may be unavailable in IAD

3 updates

resolvedDec 8, 06:27 PM

This incident has been resolved.

monitoringDec 8, 09:35 AM

The IAD region has stabilized and our team continue to monitor clusters there for issues.

investigatingDec 8, 07:59 AM

Management for some MPG clusters in IAD might be unavailable as a knock-on effect from the IAD network maintenance. Our team is investigating this issue.

noneresolvedDec 8, 12:42 AM — Resolved Dec 8, 07:20 AM

Network instability in IAD (Ashburn, Virginia) region

3 updates

resolvedDec 8, 07:20 AM

This incident has been resolved.

monitoringDec 8, 01:18 AM

A mitigation has been implemented and we are monitoring the results. There may continue to be some packet loss and closed connections until the emergency maintenance in IAD scheduled for 2025-12-08 at 5:00 AM UTC is completed.

investigatingDec 8, 12:42 AM

We are investigating network congestion in the IAD region that is causing dropped packets and closed connections. This impacts both user applications and Managed Postgres clusters, as well as some API operations.

minorresolvedDec 4, 09:14 PM — Resolved Dec 4, 10:55 PM

Network instability in IAD region

3 updates

resolvedDec 4, 10:55 PM

This incident has been resolved.

investigatingDec 4, 09:52 PM

A mitigation has been implemented restoring normal network connectivity. We are still working with our upstream providers to address the root cause.

investigatingDec 4, 09:14 PM

We are observing IPv4 and IPv6 network instability in IAD affecting user Apps, as well as Managed Postgres clusters and some API operations. We are working with our upstream network infrastructure providers to resolve the issue.

noneresolvedDec 4, 05:47 PM — Resolved Dec 4, 07:31 PM

Network instability in IAD region

4 updates

resolvedDec 4, 07:31 PM

This incident has been resolved.

monitoringDec 4, 06:57 PM

A mitigation is in place and IPv4 and IPv6 connectivity has returned to normal. We are continuing to monitor.

investigatingDec 4, 05:52 PM

We are continuing to investigate this issue.

investigatingDec 4, 05:47 PM

We are investigating alerts and reports of unstable networking (dropped packets, closed connections) in our IAD region. This impacts both user applications and Managed Postgres clusters as well as some API operations.

minorresolvedDec 4, 01:17 PM — Resolved Dec 4, 01:45 PM

Issues with App Deploys + Machine API

4 updates

resolvedDec 4, 01:45 PM

This incident has been resolved.

monitoringDec 4, 01:45 PM

We are continuing to monitor for any further issues.

monitoringDec 4, 01:25 PM

A fix has been implemented and we are monitoring the results

investigatingDec 4, 01:17 PM

We are currently investigating an issue with machines api and app deploy

minorresolvedDec 4, 02:47 AM — Resolved Dec 4, 03:32 AM

Network issues in Sydney

3 updates

resolvedDec 4, 03:32 AM

This incident has been resolved.

monitoringDec 4, 02:52 AM

Traffic appears to have resolved. We will continue to monitor network performance and reroute as needed.

investigatingDec 4, 02:47 AM

We are investigating partially degraded routing between Sydney and some regions in Europe and North America.

majorresolvedDec 2, 06:54 PM — Resolved Dec 2, 08:14 PM

Depot docker builders are failing

3 updates

resolvedDec 2, 08:14 PM

This incident has been resolved.

monitoringDec 2, 07:33 PM

A fix for this issue has been deployed by Depot, we are re-enabling Depot builders and monitoring for any problems.

investigatingDec 2, 06:54 PM

Deploys and image builds can fail in the "Waiting for Depot builder" step. We are investigating why Depot builders are not coming up, and have switched all builds to use Fly native builders temporarily. If you're still stuck in "Waiting for Depot builder", stop and restart your deploy, or use `fly deploy --depot=false`.

November 2025(15 incidents)

noneresolvedNov 30, 10:14 AM — Resolved Nov 30, 10:38 AM

Network issues in SYD

3 updates

resolvedNov 30, 10:38 AM

The issue is resolved.

investigatingNov 30, 10:18 AM

We are continuing to investigate this issue.

investigatingNov 30, 10:14 AM

We are investigating netwwork issues in the SYD region.

minorresolvedNov 29, 05:20 PM — Resolved Nov 29, 06:23 PM

General Network Issues with IPv6

5 updates

resolvedNov 29, 06:23 PM

This incident has been resolved. It was caused by a recent `flyctl` update and only apps deployed with flyctl version v0.3.227 were affected. We have deployed a server-side workaround.

monitoringNov 29, 05:52 PM

A fix has been implemented and we are monitoring the results.

identifiedNov 29, 05:44 PM

We will be rolling out a fix shortly

identifiedNov 29, 05:32 PM

We have identified an issue affecting newly-allocated Anycast ingress IPv6 addresses for some apps and are working to deploy a fix. These apps may experience timeouts when connecting over IPv6 (which some clients default to).

investigatingNov 29, 05:20 PM

We are currently investigating issues with some apps on IPv6.

majorresolvedNov 28, 09:16 PM — Resolved Nov 29, 07:05 AM

Increased API Errors

11 updates

resolvedNov 29, 07:05 AM

This incident has been resolved.

monitoringNov 29, 03:41 AM

Our metrics show that machines API has mostly recovered. We're still working through a small number of degraded Managed Postgres clusters.

monitoringNov 29, 02:58 AM

We have identified and implemented a fix for the root cause of this incident and are monitoring the results. Machines API should be seeing recovery. On the Managed Postgres side, New MPG creates should be working. A small number of clusters remain degraded, or are experiencing connectivity issues. We are continuing to restore these as quickly as possible.

identifiedNov 29, 02:30 AM

We are continuing to work on a fix for this issue. Users will still see elevated error rates creating new apps, machines, and MPG clusters, as well as other API operations. Users may also see newly created machines remaining in a `created` state for an extended period before starting.

identifiedNov 29, 01:17 AM

We are continuing to work on a fix for this issue.

identifiedNov 29, 12:33 AM

We are continuing to see elevated errors with creating new apps, new MPG clusters, and setting secrets. A small number of MPG clusters continue to see connectivity errors. We are continuing to work on these issues.

identifiedNov 28, 11:41 PM

We continue to see increased error rates creating new apps and MPG clusters, as well as with setting secrets. We are continuing to work to resolve these issues. A small number of MPG clusters are still experiencing connectivity issues. We are working to restore these to a healthy state as quickly as possible.

identifiedNov 28, 10:17 PM

We have identified some continuing issues with creating new apps and secrets and are working to resolve them.

identifiedNov 28, 09:54 PM

Our monitoring indicates that machines API, flyctl and dashboard access should have recovered. Some Managed Postgres clusters may still be affected -- Rest assured that your cluster's data is intact while we are working to recover them to a working state.

identifiedNov 28, 09:36 PM

We have identified and implemented a fix for the API outage. Some Managed Postgres clusters may be affected as a secondary effect and we are currently working to restore them.

investigatingNov 28, 09:16 PM

We are investigating an increase in errors that affects numerous API endpoints including the Machines API, flyctl and dashboard.

noneresolvedNov 28, 11:02 PM — Resolved Nov 29, 12:07 AM

Network issues in fra

3 updates

resolvedNov 29, 07:11 AM

This incident has been resolved.

monitoringNov 28, 11:07 PM

The issue appears to be resolved. We’re monitoring further to be certain.

investigatingNov 28, 11:02 PM

We’re investigating network issues in the fra region

noneresolvedNov 23, 09:30 PM — Resolved Nov 23, 10:47 PM

Proxy issues in ORD

3 updates

resolvedNov 23, 10:47 PM

This incident has been resolved.

monitoringNov 23, 10:03 PM

A fix has been implemented. We are monitoring it now.

investigatingNov 23, 09:30 PM

Users in ORD may see elevated connection errors to fly services. We are investigating the issue.

noneresolvedNov 23, 12:20 PM — Resolved Nov 23, 12:58 PM

Some /apps/:app_name page are 500

4 updates

resolvedNov 23, 12:58 PM

Removed some old Nomad code 😳 and apps page should be working just fine now

monitoringNov 23, 12:37 PM

Some customer reported that certain apps are yielding 500 errors on /apps/:app_name. A fix was implemented and is currently being deployed to our machines.

identifiedNov 23, 12:29 PM

Some customer reported that certain apps are yielding 500 errors on /apps/:app_name.

investigatingNov 23, 12:20 PM

Some customer reported that certain apps are yielding 500 errors on /apps/:app_name

criticalresolvedNov 19, 06:54 PM — Resolved Nov 20, 02:02 AM

UDP service issues

5 updates

resolvedNov 20, 02:02 AM

This incident has been resolved.

monitoringNov 19, 11:00 PM

A fix has been implemented and we are monitoring the results.

investigatingNov 19, 09:49 PM

We are continuing to investigate this issue.

investigatingNov 19, 07:38 PM

We are continuing to investigate an outage with UDP routing. Apps are currently unable to receive UDP traffic.

investigatingNov 19, 06:54 PM

We're aware of routing issues with UDP services and are currently investigating.

noneresolvedNov 18, 11:55 AM — Resolved Nov 18, 04:54 PM

Support Portal availability

4 updates

resolvedNov 18, 04:54 PM

This incident has been resolved.

monitoringNov 18, 12:30 PM

Our provider's API is responsive and the support portal is operational again. We'll continue to monitor.

identifiedNov 18, 11:59 AM

Customers with a paid support package, free trial support or MPG support are invited to create new support tickets on support@fly.io until this incident is resolved.

identifiedNov 18, 11:55 AM

Due to an ongoing incident with our upstream support platform provider, the support portal in the Fly.io dashboard is currently not operational, which means that you will not be able to view, update or create new tickets.

minorresolvedNov 18, 03:19 AM — Resolved Nov 18, 03:25 AM

Packet loss in GRU

3 updates

resolvedNov 18, 03:25 AM

This incident has been resolved.

monitoringNov 18, 03:20 AM

A fix has been implemented and we are monitoring the results.

investigatingNov 18, 03:19 AM

We are seeing packet loss on some hosts in GRU. We are working with our upstreams to route the traffic around the affected providers.

majorresolvedNov 17, 04:13 PM — Resolved Nov 17, 04:25 PM

Google SSO errors

2 updates

resolvedNov 17, 04:25 PM

This incident has been resolved.

investigatingNov 17, 04:13 PM

We are investigating "SSO error" messages when trying to access organizations with Required SSO through Google.

minorresolvedNov 17, 10:25 AM — Resolved Nov 17, 11:00 AM

Network issues in BOM

2 updates

resolvedNov 17, 11:00 AM

This incident has been resolved.

investigatingNov 17, 10:25 AM

We are investigating network issues in BOM region. Apps may experience higher latency or packet loss.

noneresolvedNov 14, 06:30 PM — Resolved Nov 14, 06:30 PM

Network Issues in SJC

1 update

resolvedNov 14, 11:07 PM

An upstream provider experienced network issues in the SJC region. Apps on affected hosts and API clients near SJC may have experienced degraded connectivity and elevated error rates between 18:38 to 19:03 and 20:02 to 20:26 UTC. The issue is now resolved.

noneresolvedNov 14, 03:52 PM — Resolved Nov 14, 05:45 PM

Wireguard errors in Flyctl v0.3.214

3 updates

resolvedNov 14, 05:45 PM

This incident has been resolved.

monitoringNov 14, 05:00 PM

A new version of Flyctl has been released with a fix for this issue. Users on flyctl version v0.3.214 should upgrade to Flyctl v0.3.216 using `fly version update`.

identifiedNov 14, 03:52 PM

We have identified an issue in the latest Flyctl release (v0.3.214) that impacts commands using wireguard connectivity for some users. Impacted users may see `tunnel unavailable` or `no such organization` errors when running commands like `fly wireguard` , `fly proxy` or `fly mpg proxy` with their default Personal org. We are working on a fix. In the meantime, impacted users can install the previous flyctl version with `curl -L https://fly.io/install.sh | sh -s 0.3.213`. Users that normally install flyctl via package managers (eg. homebrew) should uninstall the package manager version first to avoid conflicts.

criticalresolvedNov 13, 04:59 PM — Resolved Nov 13, 09:57 PM

Static egress IPv6 issues in BOM

2 updates

resolvedNov 13, 09:57 PM

This incident has been resolved.

identifiedNov 13, 04:59 PM

We have identified an upstream issue that prevented static egress IPv6 addresses in BOM from reaching parts of the Internet and are currently working with upstreams for a fix. Machines without a static egress IPv6 address are not affected. For affected machines, consider forcing IPv4 for outbound connectivity or de-allocating the egress IP if not needed via `fly m egress-ip release`.

criticalresolvedNov 13, 07:34 PM — Resolved Nov 13, 09:56 PM

Network maintenance in BOM

3 updates

resolvedNov 13, 09:56 PM

This incident has been resolved.

monitoringNov 13, 07:58 PM

The maintenance has been completed and we are monitoring the results.

identifiedNov 13, 07:34 PM

One of our upstream providers is performing an emergency network maintenance in BOM. Apps in BOM may experience temporary connectivity issues.