Fly.io Outage History
Past incidents and downtime events
Complete history of Fly.io outages, incidents, and service disruptions. Showing 50 most recent incidents.
February 2026(4 incidents)
IPv6 Issues in YYZ
3 updates
This incident has been resolved.
A fix has been implemented and we're seeing IPv6 networking return to normal in YYZ. We'll continue to monitor to ensure full recovery.
We are currently investigating degraded IPv6 networking in the YYZ (Toronto) region. Users with machines in this region may see issues connecting to their machines over IPv6. Users with static egress IPs may see issues connecting outbound over IPv6 from this region at this time. IPv4 is not impacted and continues to work normally.
Elevated latency and packetloss in North American regions
3 updates
This incident has been resolved.
Network performance issues between North American regions have resolved and we're continuing to monitor.
We are currently investigating intermittent spikes of increased latency and packet loss between North American regions over the past hour. Users may see degraded network performance on traffic in and out of the IAD and SJC regions at this time. We are working with our upstream networking providers to investigate and mitigate these issues.
Congestion in CDG and FRA
2 updates
This incident has been resolved.
We are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).
Sprites are returning not found or unauthorized when they shouldn't be.
6 updates
This incident has been resolved.
We've been able to restore missing sprites and tokens. We're monitoring for any additional issues.
We're working on a fix to restore missing sprites and tokens.
We identified the source of the problem as an upstream DNS issue Tigris experienced, now resolved. We're currently assessing the impact on Sprites.
We are continuing to investigate this issue.
We're currently investigating this issue.
January 2026(18 incidents)
Grafana Log Search Display Issue
3 updates
This has been resolved. If you are still experiencing any issues, you may need to log out and then back in.
No logs are displayed in Grafana Log Search when using the default `*` query. You can try the following workarounds: 1. Replace the default `*`query with `NOT ""` 2. Viewing logs from the “fly app” tab or the “explore” tab, Thank you for your kind understanding as we work through resolving this!
No logs are displayed in Grafana Log Search when using the default `*` query. As a temporary workaround, please replace `*` with `NOT ""` query. Thank you for your kind understanding as we work through resolving this!
Delayed metric reporting in NRT and SIN regions
5 updates
This incident has been resolved. All hosts in SIN and NRT are reporting up to date metrics.
Currently one host in SIN is still finishing working through it's metrics backlog and is reporting delayed metrics. Other hosts in NRT and SIN are reporting metrics correctly. If needed, users with impacted machines on the remaining host can use `fly machine clone` to create new machines in the region, which should land on a different host.
Most hosts in NRT and SIN have completed backfilling their metrics and are up to date in fly-metrics.net. Four hosts are still working through the backlog; machines on those hosts are still reporting delayed metrics at this time.
We are continuing to process the metrics backlog in NRT and SIN. Progress is being made, but due to the volume of metrics this may still take some time to fully complete. At this time users with machines on impacted hosts will see metrics beginning to backfill into fly-metrics.net. However many will not be fully caught up yet. This impacts metrics only, the underlying machines continue to work normally.
A small number of hosts in the NRT (Tokyo) and SIN (Singapore) are reporting delayed metrics to the hosted Grafana charts at fly-metrics.net. Users with machines on impacted hosts will see delayed or spotty metrics in their Grafana charts. Only metrics for these machines are impacted. The underlying machines continue to receive and serve traffic as usual, and all machine actions(stopping, starting, deploys etc.) continue to work normally. We are processing the backlog of metrics on these hosts, but metrics will be delayed until this is complete.
Congestion in CDG and FRA
1 update
We are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).
Delays issuing certificates
3 updates
This incident has been resolved.
We have identified the congestion and released a fix, we'll continue to monitor while the jobs catch up.
We are currently investigating possible delays issuing ACME certificates for new hostnames.
Errors creating new Sprites
3 updates
This incident has been resolved.
We are continuing to investigate this issue.
We're currently investigating an issue that's preventing new Sprites from being created.
MPG network instability in LAX
3 updates
This incident has been resolved.
Connections are back to normal. We'll keep monitoring the region.
We identified network partitions in the LAX region. We are investigating the problem.
Machines errors in JNB region
2 updates
This incident has been resolved.
A bad deploy of an internal service in JNB region may cause Machines API requests for JNB region machines to fail. At this time, it may not be possible to create or update machines in JNB region, but apps continue to run. The deploy is being reverted.
Delayed app logs
2 updates
This incident has been resolved.
App logs after 12:00 UTC may be delayed to show up on fly-metrics.net. We are monitoring log insert rate and will update once log insertion is current. Log streaming via `flyctl logs` or NATS is current.
Network issues in SIN region
2 updates
This incident has been resolved.
We are investigating network issues in SIN region. Apps running in this region may experience elevated latency or packet loss.
Elevated Machines errors in ORD
4 updates
This incident has been resolved. Machine placement improvements are now deployed in all regions.
The fix has been deployed to ORD and we are monitoring results. New Machines will be placed using stricter memory thresholds. Existing Machines on impacted hosts will be migrated to new hosts on start and update. We are also provisioning additional capacity in ORD.
We are deploying a fix to enforce stricter memory thresholds for Machine placements. This will steer new and migrated workloads towards hosts with optimal capacity.
We are investigating high memory utilization on some hosts in ORD. Customers may notice timeouts or errors in reserving resources when updating or creating new Machines in the region. We are rebalancing workloads in ORD to relieve memory pressure on impacted hosts.
MPG clusters in SIN experiencing network issues
4 updates
Connections normal for all clusters in SIN.
Cluster connections are stable. We'll keep monitoring.
Network performance in SIN has normalized and MPG clusters are working as expected. We're continuing to monitor to ensure continued stability, but customers should not see an impact on their cluster at this time.
Worker machines in SIN are affected by a provider's network outage. Database clusters may present network connectivity issues. We are monitoring the situation.
MPG instability in LAX
4 updates
Clusters are stable.
The root cause is a temporary network degradation in LAX; multiple clusters lost contact with the DCS. Connections are being reestablished.
We are continuing to investigate this issue.
We identified potential connection problems on ~50 clusters in LAX. We are investigating the cause and remediating the problem.
App logs unavailable
5 updates
This incident has been resolved.
App log services are functioning normally. We are continuing to monitor.
We have identified the issue and applied a mitigation. App logs should now be available again through Grafana. Some logs may be delayed or missing. We are still working to address the root cause.
We are continuing to investigate this issue.
We are currently investigating an outage with app logs in Grafana. Logs are still available through flyctl and through the dashboard.
Authentication token issues
2 updates
This incident has been resolved.
We are investigating intermittent issues with authentication. Apps continue to run, and APIs are accessible with existing tokens, but operations such as creating new tokens may fail.
Metrics display issue in Mumbai (BOM) region
9 updates
This incident has been resolved and all hosts in BOM are accurately reporting metrics.
One host in BOM remains reporting delayed metrics as it continues to catch up. All other hosts in BOM are reporting metrics correctly. If needed, users with impacted machines can use `fly machine clone` to create new machines in the region, which should land on a different host.
Metrics have returned to normal for most hosts in BOM. Two hosts are still reporting delayed metrics, but are continuing to catch up. Users with impacted machines can use `fly machine clone` to add new machines, which should land on a different host.
Metrics have completed backfilling and are up to date on most hosts in the BOM region. Two hosts are still working through the backlog; machines on those two hosts are still reporting delayed metrics at this time.
We are continuing to process the metrics backlog in BOM. Progress is being made, but due to the volume of metrics this may still take some time to fully complete. At this time users with machines in BOM should see metrics from the past 12h beginning to backfill into fly-metrics.net. However most will not be fully caught up yet.
We are continuing to work through the backlog of metrics in BOM. Metrics will remain unavailable for BOM machines until this is complete.
Our metrics cluster is continuing to working through the backlog of metrics in BOM. Metrics for machines running in the BOM region will continue to be unavailable until this is complete.
The cause of the issue has been identified and a fix is being implemented. Metrics for machines in the BOM region remain unavailable in fly-metrics.net, however the machines themselves continue to run, start, and stop normally.
We are currently investigating issues collecting machine metrics for machines running in the BOM (Mumbai, India) region. Machines in the region continue to run, start, and stop normally, however metrics for these machines are not displaying in fly-metrics.net.
Elevated GraphQL API Latency and Dashboard Error Rates
6 updates
This incident has been resolved. We have seen API latency normalize and remain normal since ~19:00 UTC.
API error rates have normalized, however users may still see elevated latency reaching some GraphQL endpoints. Latency continues to trend in the right direction, we continue to monitor for full recovery .
We have deployed an initial fix and are seeing improvements. GraphQL error rates and latency remain elevated over the baseline at this time. We are continuing to keep a close eye on recovery.
The issue has been identified and a fix is being implemented.
We are continuing to investigate elevated latency and error rates on our GraphQL API endpoints. Users may see errors on parts of the platform that use these APIs. This includes Flyctl actions such as deploys, as well as the fly.io dashboard.
We are investigating elevated API Latency and Error rates on the platform. Users may see delays or errors creating apps, as well as on some dashboard pages. Machines API actions appear unimpacted at this time
Management plane outage
10 updates
This incident has been resolved.
A fix has been implemented and we are seeing system performance return to normal. Machine API and general platform operations are succeeding again, although users may see slightly elevated error rates as things finish stabilizing. We are continuing to closely monitor the platform to ensure full recovery and stability.
We are continuing to work on a fix for this issue.
Services are starting to come up. The dashboard should be accessible and deploys and other flyctl based commands should work. Some services may feel sluggish while things heat up.
The team is getting closer to a fix. We will provide another update within the next 30 minutes.
We are continuing to make progress on a fix for this issue.
We are continuing to work on restoring service to the Machines API and other affected platform components..
We are continuing to work on deploying a fix. The fly.io dashboard and the machines API continue to be unavailable at this time. Fly Managed Postgres (MPG) clusters continue to run normally, however creating new clusters will fail at this time. Users may also see scheduled backups remain in a running or pending state at this time. These backups will resume as scheduled once the platform level issues are resolved
We have identified the cause of the outage and are working on a fix. The fly.io dashboard and the machines API continue to be unavailable at this time. Running machines should continue to stay up and be reachable at this time. However creating/starting/stopping machines, running new deployments, or other operations that rely on the machines API remain unavailable.
We are investigating a major outage of our control plane. Apps may continue to run, but it is not currently possible to log in to the dashboard or use the Machines API.
Network issues impacting EU <-> US traffic
3 updates
This incident has been resolved.
We're seeing performance on impacted routes return to normal levels. Prior to recovering, we observed intermittent high packet loss for US EU traffic, most acutely from approximately Dec 31 23:35 to 23:45 UTC, and later from Jan 1 00:40 to 01:35.
We've detected degraded network performance on some of our upstream network providers, impacting traffic between US and EU regions. We're in contact with these teams as we monitor for recovery.
December 2025(13 incidents)
Network issues in JNB region
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are investigating network issues in JNB region. Apps may experience increased latency or packet loss.
Networking and metrics degradation
4 updates
This incident has been resolved.
Network performance across Fly.io has returned to normal. This incident primarily impacted machines in the SJC and EWR regions. Metrics are largely caught up, but users may still see a slight delay in reporting as the cluster finishes catching up.
A change has been made and metrics on fly-metrics.net are backfilling. Users may still see a slight delay or gaps in new metrics being reported as the backfill completes. We continue to see higher than usual latency and packetloss across the network. We are continuing to investigate.
We are currently investigating increased latency and packet loss across multiple regions. Customers may see additional latency on requests at this time. Relatedly, prometheus metrics reported via fly-metrics.net is currently degraded. Users may see delays or gaps in metrics at this time. We are working to address both issues.
Network issues in SIN
4 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We have identified an upstream issue and are currently working to restore connectivity. In the mean time, we have temporarily rerouted Anycast traffic away from SIN.
We are investigating network issues in SIN. Apps might experience partial outage connecting to some IPs.
Network connectivity issues in GRU
5 updates
This incident has been resolved.
We are continuing to see elevated packet loss from a site in our GRU region to other fly regions in the US, via some US ISPs. Impacted apps may see degraded or broken connections. Local GRU traffic is unaffected. We are working with our provider in GRU to improve routing and will update once a fix is in place.
We are continuing to see elevated packet loss from a site in our GRU region to other fly regions in the US, via some US ISPs. Impacted apps may see degraded or broken connections. Local GRU traffic is unaffected. We are working with our provider in GRU to improve routing and will update once a fix is in place.
Some hosts in GRU now have healthy routes to all destinations. We are continuing to work with our upstreams on restoring full network connectivity for all machines. Apps may see degraded or broken connections to some destinations.
We're aware of network issues in GRU and are working with our upstream providers to reroute traffic.
Increased Machines API Errors in YYZ
5 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We're seeing some recurrence of the issue and are continuing to investigate.
A fix has been implemented and we are monitoring the results.
We are investigating an increase in Machines API errors in the YYZ region. Machine updates and requests to machines located in this region may take longer than usual.
Network outage in SIN region
2 updates
This incident has been resolved.
We are investigating a network issues in SIN region.
MPG management may be unavailable in IAD
3 updates
This incident has been resolved.
The IAD region has stabilized and our team continue to monitor clusters there for issues.
Management for some MPG clusters in IAD might be unavailable as a knock-on effect from the IAD network maintenance. Our team is investigating this issue.
Network instability in IAD (Ashburn, Virginia) region
3 updates
This incident has been resolved.
A mitigation has been implemented and we are monitoring the results. There may continue to be some packet loss and closed connections until the emergency maintenance in IAD scheduled for 2025-12-08 at 5:00 AM UTC is completed.
We are investigating network congestion in the IAD region that is causing dropped packets and closed connections. This impacts both user applications and Managed Postgres clusters, as well as some API operations.
Network instability in IAD region
3 updates
This incident has been resolved.
A mitigation has been implemented restoring normal network connectivity. We are still working with our upstream providers to address the root cause.
We are observing IPv4 and IPv6 network instability in IAD affecting user Apps, as well as Managed Postgres clusters and some API operations. We are working with our upstream network infrastructure providers to resolve the issue.
Network instability in IAD region
4 updates
This incident has been resolved.
A mitigation is in place and IPv4 and IPv6 connectivity has returned to normal. We are continuing to monitor.
We are continuing to investigate this issue.
We are investigating alerts and reports of unstable networking (dropped packets, closed connections) in our IAD region. This impacts both user applications and Managed Postgres clusters as well as some API operations.
Issues with App Deploys + Machine API
4 updates
This incident has been resolved.
We are continuing to monitor for any further issues.
A fix has been implemented and we are monitoring the results
We are currently investigating an issue with machines api and app deploy
Network issues in Sydney
3 updates
This incident has been resolved.
Traffic appears to have resolved. We will continue to monitor network performance and reroute as needed.
We are investigating partially degraded routing between Sydney and some regions in Europe and North America.
Depot docker builders are failing
3 updates
This incident has been resolved.
A fix for this issue has been deployed by Depot, we are re-enabling Depot builders and monitoring for any problems.
Deploys and image builds can fail in the "Waiting for Depot builder" step. We are investigating why Depot builders are not coming up, and have switched all builds to use Fly native builders temporarily. If you're still stuck in "Waiting for Depot builder", stop and restart your deploy, or use `fly deploy --depot=false`.
November 2025(15 incidents)
Network issues in SYD
3 updates
The issue is resolved.
We are continuing to investigate this issue.
We are investigating netwwork issues in the SYD region.
General Network Issues with IPv6
5 updates
This incident has been resolved. It was caused by a recent `flyctl` update and only apps deployed with flyctl version v0.3.227 were affected. We have deployed a server-side workaround.
A fix has been implemented and we are monitoring the results.
We will be rolling out a fix shortly
We have identified an issue affecting newly-allocated Anycast ingress IPv6 addresses for some apps and are working to deploy a fix. These apps may experience timeouts when connecting over IPv6 (which some clients default to).
We are currently investigating issues with some apps on IPv6.
Increased API Errors
11 updates
This incident has been resolved.
Our metrics show that machines API has mostly recovered. We're still working through a small number of degraded Managed Postgres clusters.
We have identified and implemented a fix for the root cause of this incident and are monitoring the results. Machines API should be seeing recovery. On the Managed Postgres side, New MPG creates should be working. A small number of clusters remain degraded, or are experiencing connectivity issues. We are continuing to restore these as quickly as possible.
We are continuing to work on a fix for this issue. Users will still see elevated error rates creating new apps, machines, and MPG clusters, as well as other API operations. Users may also see newly created machines remaining in a `created` state for an extended period before starting.
We are continuing to work on a fix for this issue.
We are continuing to see elevated errors with creating new apps, new MPG clusters, and setting secrets. A small number of MPG clusters continue to see connectivity errors. We are continuing to work on these issues.
We continue to see increased error rates creating new apps and MPG clusters, as well as with setting secrets. We are continuing to work to resolve these issues. A small number of MPG clusters are still experiencing connectivity issues. We are working to restore these to a healthy state as quickly as possible.
We have identified some continuing issues with creating new apps and secrets and are working to resolve them.
Our monitoring indicates that machines API, flyctl and dashboard access should have recovered. Some Managed Postgres clusters may still be affected -- Rest assured that your cluster's data is intact while we are working to recover them to a working state.
We have identified and implemented a fix for the API outage. Some Managed Postgres clusters may be affected as a secondary effect and we are currently working to restore them.
We are investigating an increase in errors that affects numerous API endpoints including the Machines API, flyctl and dashboard.
Network issues in fra
3 updates
This incident has been resolved.
The issue appears to be resolved. We’re monitoring further to be certain.
We’re investigating network issues in the fra region
Proxy issues in ORD
3 updates
This incident has been resolved.
A fix has been implemented. We are monitoring it now.
Users in ORD may see elevated connection errors to fly services. We are investigating the issue.
Some /apps/:app_name page are 500
4 updates
Removed some old Nomad code 😳 and apps page should be working just fine now
Some customer reported that certain apps are yielding 500 errors on /apps/:app_name. A fix was implemented and is currently being deployed to our machines.
Some customer reported that certain apps are yielding 500 errors on /apps/:app_name.
Some customer reported that certain apps are yielding 500 errors on /apps/:app_name
UDP service issues
5 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are continuing to investigate this issue.
We are continuing to investigate an outage with UDP routing. Apps are currently unable to receive UDP traffic.
We're aware of routing issues with UDP services and are currently investigating.
Support Portal availability
4 updates
This incident has been resolved.
Our provider's API is responsive and the support portal is operational again. We'll continue to monitor.
Customers with a paid support package, free trial support or MPG support are invited to create new support tickets on support@fly.io until this incident is resolved.
Due to an ongoing incident with our upstream support platform provider, the support portal in the Fly.io dashboard is currently not operational, which means that you will not be able to view, update or create new tickets.
Packet loss in GRU
3 updates
This incident has been resolved.
A fix has been implemented and we are monitoring the results.
We are seeing packet loss on some hosts in GRU. We are working with our upstreams to route the traffic around the affected providers.
Google SSO errors
2 updates
This incident has been resolved.
We are investigating "SSO error" messages when trying to access organizations with Required SSO through Google.
Network issues in BOM
2 updates
This incident has been resolved.
We are investigating network issues in BOM region. Apps may experience higher latency or packet loss.
Network Issues in SJC
1 update
An upstream provider experienced network issues in the SJC region. Apps on affected hosts and API clients near SJC may have experienced degraded connectivity and elevated error rates between 18:38 to 19:03 and 20:02 to 20:26 UTC. The issue is now resolved.
Wireguard errors in Flyctl v0.3.214
3 updates
This incident has been resolved.
A new version of Flyctl has been released with a fix for this issue. Users on flyctl version v0.3.214 should upgrade to Flyctl v0.3.216 using `fly version update`.
We have identified an issue in the latest Flyctl release (v0.3.214) that impacts commands using wireguard connectivity for some users. Impacted users may see `tunnel unavailable` or `no such organization` errors when running commands like `fly wireguard` , `fly proxy` or `fly mpg proxy` with their default Personal org. We are working on a fix. In the meantime, impacted users can install the previous flyctl version with `curl -L https://fly.io/install.sh | sh -s 0.3.213`. Users that normally install flyctl via package managers (eg. homebrew) should uninstall the package manager version first to avoid conflicts.
Static egress IPv6 issues in BOM
2 updates
This incident has been resolved.
We have identified an upstream issue that prevented static egress IPv6 addresses in BOM from reaching parts of the Internet and are currently working with upstreams for a fix. Machines without a static egress IPv6 address are not affected. For affected machines, consider forcing IPv4 for outbound connectivity or de-allocating the egress IP if not needed via `fly m egress-ip release`.
Network maintenance in BOM
3 updates
This incident has been resolved.
The maintenance has been completed and we are monitoring the results.
One of our upstream providers is performing an emergency network maintenance in BOM. Apps in BOM may experience temporary connectivity issues.