Fly.io Outage History

50 incidents reported. Data sourced from the official Fly.io status page.

50
Total Incidents
18
Major/Critical
23
Minor
49
Resolved

March 2026

Machines failing to start in DFW

minor
Mar 20, 07:26 AMmonitoring
Mar 21, 08:26 AM
monitoringMachine start success rates in DFW have improved but we are continuing to monitor and make further adjustments. We will provide updates as the situation progresses.
Mar 20, 12:45 PM
monitoringIn addition to freeing up existing capacity, the team has provisioned new capacity in DFW and we are monitoring the results.
Mar 20, 08:08 AM
monitoringWe freed up some capacity on our workers to allow for successful Machine starts.
+1 more updates

Metrics currently experiencing issues

critical
Mar 19, 06:28 AMMar 19, 10:37 AMresolved
Mar 19, 10:37 AM
resolvedThis incident has been resolved. We're unable to recover the lost metrics from that one hour.
Mar 19, 07:12 AM
monitoringWe have implemented a fix. There has been approximately 1h of lost metrics from 06:07UTC. We're monitoring the cluster for further issues
Mar 19, 06:28 AM
investigatingWe are currently investigating an issue with our metrics cluster.

Machines failing to start in DFW

major
Mar 18, 09:58 AMMar 18, 06:53 PMresolved
Mar 18, 06:53 PM
resolvedThis incident has been resolved. Machine creates in DFW continue to work normally.
Mar 18, 12:40 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 18, 11:44 AM
identifiedThe team is currently rolling out additional capacity in DFW which should help ease Machine start failures across the region.
+1 more updates

IPv6 networking issues in SJC region

major
Mar 18, 04:12 PMMar 18, 05:02 PMresolved
Mar 18, 05:02 PM
resolvedThis incident has been resolved.
Mar 18, 04:31 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 18, 04:12 PM
investigatingWe are investigating intermittent network issues in SJC region impacting outbound public IPv6 access from Machines. Connecting to IPv6 internet resources from apps hosted in SJC region may be slow or ...

Connection Issues in SJC

minor
Mar 18, 02:07 PMMar 18, 02:18 PMresolved
Mar 18, 02:18 PM
resolvedThis incident has been resolved.
Mar 18, 02:07 PM
monitoringBetween 13:55 and 14:03 UTC machines and MPG clusters hosted in the SJC region saw elevated connection errors. Users may have seen errors connecting to or from most machines in the region, as well as ...

Fly ssh console command failing

minor
Mar 18, 02:12 PMMar 18, 02:18 PMresolved
Mar 18, 02:18 PM
resolvedThis incident has been resolved.
Mar 18, 02:17 PM
monitoringA fix has been implemented and we are seeing `ssh console` commands succeed as normal.
Mar 18, 02:12 PM
identifiedWe have identified an issue causing new `fly ssh console` connections to fail with 500 errors. A fix is in progress.

Sprites Operations: 401 errors for certain organizations

none
Mar 14, 04:20 AMMar 14, 02:05 PMresolved
Mar 14, 02:05 PM
resolvedThis incident has been resolved.
Mar 14, 01:55 PM
monitoringOrganizations with names prefixed with numerical digits may experience 401 errors. Affected operations include actions such as Sprite creation, listing, etc... A fix has been implemented since 2026-0...

Setting secrets and creating apps is degraded

major
Mar 11, 09:19 AMMar 11, 11:37 AMresolved
Mar 11, 11:37 AM
resolvedThis incident has been resolved.
Mar 11, 11:03 AM
monitoringWhile the secret storage service was in a read-only state, app creation requests queued up, due to the retry logic and insufficient request concurrency limits in our GraphQL API. This prevented our Gr...
Mar 11, 10:14 AM
monitoringA fix has been implemented and we are monitoring the results.
+1 more updates

Private networking issues in SYD region

major
Mar 7, 02:42 PMMar 7, 03:56 PMresolved
Mar 7, 03:56 PM
resolvedThis incident has been resolved.
Mar 7, 03:10 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 7, 02:42 PM
investigatingWe are investigating a private networking failure between SYD and other regions. Apps continue to run, and private networking within SYD is unaffected.

Routing issues in NA regions

none
Mar 5, 07:24 PMMar 5, 07:50 PMresolved
Mar 5, 07:50 PM
resolvedThis incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during thi...
Mar 5, 07:38 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 5, 07:24 PM
investigatingWe're aware of routing issues affecting some customers in North America regions, and we're actively investigating.

Elevated GraphQL API errors

major
Mar 3, 08:18 PMMar 3, 09:15 PMresolved
Mar 3, 09:15 PM
resolvedThis incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service. We are still investigating the root cause of the failure. In ...
Mar 3, 08:36 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 3, 08:18 PM
investigatingWe're investigating elevated GraphQL errors that affect some API endpoints.

Cost Explorer fails to load

minor
Mar 3, 10:50 AMMar 3, 12:10 PMresolved
Mar 3, 12:10 PM
resolvedThis incident has been resolved.
Mar 3, 10:50 AM
investigatingWe are currently investigating this issue. The page currently displays: "We’re having trouble loading the cost breakdown."

Certificates issues affecting API and proxy

none
Mar 3, 12:54 AMMar 3, 12:54 AMresolved
Mar 3, 02:05 AM
resolvedBetween 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignm...

Machines failing to boot in EWR

major
Mar 2, 05:42 PMMar 2, 10:49 PMresolved
Mar 2, 10:49 PM
resolvedThis incident has been resolved.
Mar 2, 08:35 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 2, 06:21 PM
identifiedThe issue has been identified and a fix is being implemented.
+1 more updates

Issues with the Machines API

minor
Mar 2, 09:19 PMMar 2, 09:50 PMresolved
Mar 2, 09:50 PM
resolvedThis incident has been resolved.
Mar 2, 09:47 PM
monitoringA fix has been implemented and we are monitoring the results.
Mar 2, 09:39 PM
identifiedThe issue has been identified and a fix is being implemented.
+1 more updates

February 2026

Slow API requests

major
Feb 27, 06:50 PMFeb 27, 08:21 PMresolved
Feb 27, 08:21 PM
resolvedThis incident has been resolved. All platform and API operations are working normally.
Feb 27, 08:05 PM
monitoringAPI and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery. Background jobs are almost fully caught up. Users may still see slightly slower requests...
Feb 27, 07:41 PM
identifiedA second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to exi...
+6 more updates

Capacity issues in iad and dfw

minor
Feb 27, 03:34 PMFeb 27, 05:54 PMresolved
Feb 27, 05:54 PM
resolvedThis incident has been resolved.
Feb 27, 05:31 PM
monitoringWe have provisioned additional capacity in dfw and iad and are monitoring to ensure machine and builder starts are succeeding consistently.
Feb 27, 03:34 PM
identifiedThese regions (Dallas, TX dfw and Ashburn, VA iad) are currently low on capacity. New machine creates in these regions might fail temporarily, and Depot builders may be unavailable, causing deploys to...

Capacity isssues in iad and dfw

none
Feb 26, 05:00 PMFeb 26, 10:28 PMresolved
Feb 26, 10:28 PM
resolvedThis incident has been resolved.
Feb 26, 08:19 PM
monitoringWe're continuing to monitor after having added more capacity to our DFW and IAD regions. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users shoul...
Feb 26, 06:57 PM
identifiedWe have added additional capacity in DFW and IAD regions and are monitoring the impact. New machine creates and deploys without volumes are seeing improved success rates. Deploys using depot builde...
+3 more updates

Sprites API degradation

none
Feb 24, 05:23 PMFeb 24, 05:51 PMresolved
Feb 24, 05:51 PM
resolvedThis incident has been resolved.
Feb 24, 05:24 PM
identifiedA slow deploy is causing Sprites API degradation. We are implementing a fix.
Feb 24, 05:23 PM
identifiedA slow deploy is causing Sprites API degradation. We are implementing a fix.

Metrics are degraded

minor
Feb 24, 04:33 AMFeb 24, 11:06 AMresolved
Feb 24, 11:06 AM
resolvedMetrics processing has caught up, and we don't see any data loss.
Feb 24, 09:35 AM
monitoringDelayed metrics are still being processed.
Feb 24, 06:46 AM
monitoringMetrics are coming back online, but it will take a little time to process what's backed up in the queues.
+2 more updates

Sprite creations failing

minor
Feb 24, 09:39 AMFeb 24, 10:44 AMresolved
Feb 24, 10:44 AM
resolvedThis incident has been resolved.
Feb 24, 10:25 AM
monitoringA fix has been implemented and we are monitoring the results.
Feb 24, 09:39 AM
investigatingWe are currently investigating issues creating new Sprites.

Degraded Managed Postgres Control Plane

none
Feb 23, 03:00 PMFeb 23, 08:30 PMresolved
Feb 24, 12:31 AM
resolvedThis incident has been resolved as of 20:30 UTC.
Feb 23, 03:00 PM
investigatingWe are currently investigating issues with the MPG control plane. Users may experience delays or hanging when creating or deleting databases via the dashboard or CLI.

Deploys hanging at waiting for Depot Builder

minor
Feb 20, 04:14 PMFeb 20, 08:49 PMresolved
Feb 20, 08:49 PM
resolvedThis incident has been resolved.
Feb 20, 07:38 PM
monitoringThe fix has been rolled out and we are seeing deploys using depot builder succeeding normally. We continue to monitor to ensure full recovery. Depot builders have been reenabled as the default optio...
Feb 20, 05:59 PM
identifiedA fix is being rolled out. Fly builders continue to be the default while this is deployed
+2 more updates

Networking issues for users connecting through lhr

minor
Feb 20, 10:52 AMFeb 20, 11:57 AMresolved
Feb 20, 11:57 AM
resolvedNetwork traffic in LHR has been stable for some time now, we are not seeing any further issues.
Feb 20, 11:21 AM
monitoringA fix has been implemented and we are monitoring the results.
Feb 20, 10:52 AM
investigatingWe’re currently investigating this issue.

Investigating registry issues affecting deploys

minor
Feb 19, 09:14 PMFeb 20, 12:05 AMresolved
Feb 20, 12:05 AM
resolvedThis incident has been resolved.
Feb 19, 10:24 PM
identifiedWhile we have seen some improvement from the previous fix, we are still seeing elevated rates of Registry connection issues. Users may continue to see slower machine creates and deploys due to slow im...
Feb 19, 09:49 PM
monitoringA fix has been implemented and we are monitoring the results.
+2 more updates

Control plane state delayed on some hosts possibly causing network or deployment disruption

major
Feb 18, 04:22 PMFeb 18, 04:44 PMresolved
Feb 18, 04:44 PM
resolvedThis incident has been resolved.
Feb 18, 04:28 PM
monitoringA fix has been implemented and we are monitoring the results.
Feb 18, 04:23 PM
identifiedWe are continuing to work on a fix for this issue.
+1 more updates

flyctl deploy timeouts

major
Feb 17, 01:06 PMFeb 17, 02:24 PMresolved
Feb 17, 02:24 PM
resolvedEarlier today, an issue caused elevated rate limiting and some deployment timeouts. A fix is in place and deployments are back to normal.
Feb 17, 01:42 PM
monitoringA fix has been implemented and we are monitoring the results.
Feb 17, 01:06 PM
identifiedWe’re investigating elevated 429 errors from flaps causing deployment timeouts. Affected deploys are failing with: ✖ Failed: error waiting for release_command machine XX to finish running: timeout rea...

Degraded Managed Postgres Control Plane in ORD

major
Feb 14, 11:33 AMFeb 14, 02:27 PMresolved
Feb 14, 02:27 PM
resolvedThis incident has been resolved.
Feb 14, 02:07 PM
monitoringA fix has been implemented and we are seeing full recovery of the control plane in ORD. With that recovery we are seeing impacted replicas catching up and clusters returning to normal health. We're co...
Feb 14, 01:47 PM
identifiedWe are continuing to work on a fix for this issue.
+2 more updates

Issues with deploying apps using Depot builders for new accounts

minor
Feb 11, 08:44 PMFeb 11, 09:30 PMresolved
Feb 11, 09:30 PM
resolvedThis incident has been resolved.
Feb 11, 09:24 PM
monitoringA fix has been implemented and we are monitoring the results.
Feb 11, 08:57 PM
identifiedThe issue has been identified and a fix is being implemented.
+1 more updates

Creating new sprites is degraded

minor
Feb 11, 06:07 AMFeb 11, 07:22 AMresolved
Feb 11, 07:22 AM
resolvedThis incident has been resolved.
Feb 11, 06:57 AM
monitoringSprite creation appears to be back to normal operation now.
Feb 11, 06:52 AM
identifiedWe've identified the cause of the delay following creates and we're deploying a fix.
+3 more updates

Degraded MPG clusters in IAD

minor
Feb 10, 07:00 PMFeb 10, 08:44 PMresolved
Feb 10, 08:44 PM
resolvedThis incident has been resolved.
Feb 10, 08:00 PM
monitoringWe've rolled out a fix for the remaining impacted clusters, and we're now monitoring the results.
Feb 10, 07:53 PM
identifiedWe've rolled out a fix for some additional impacted clusters, and we're continuing to work on the remaining clusters.
+2 more updates

Issue creating new Sprites in IAD

minor
Feb 9, 08:29 PMFeb 9, 09:38 PMresolved
Feb 9, 09:38 PM
resolvedThis incident has been resolved.
Feb 9, 09:19 PM
monitoringA fix has been implemented and we are monitoring the results.
Feb 9, 08:45 PM
identifiedThe issue has been identified and a fix is being implemented.
+1 more updates

Degraded network in AMS

major
Feb 9, 07:17 AMFeb 9, 10:55 AMresolved
Feb 9, 10:55 AM
resolvedThis incident has been resolved.
Feb 9, 09:47 AM
monitoringA fix has been implemented and we are monitoring the results.
Feb 9, 08:58 AM
identifiedWe are still working on restoring the MPG clusters. Most of them should be operational already.
+3 more updates

Machines API issues

major
Feb 7, 04:23 PMFeb 7, 06:13 PMresolved
Feb 7, 06:13 PM
resolvedThis incident has been resolved.
Feb 7, 05:17 PM
monitoringA fix has been implemented and we are seeing Machines API connectivity improve in APAC regions. We continue monitoring for full recovery.
Feb 7, 04:40 PM
identifiedThe issue has been identified and we are seeing Machines API performance improve in most regions since ~16:20 UTC. Machines API calls in the SYD, NRT, SIN region may continue to see 5xx errors or hi...
+1 more updates

Private Networking and Certificate Resolution Issues in SYD

major
Feb 7, 03:19 PMFeb 7, 06:12 PMresolved
Feb 7, 06:12 PM
resolvedThis incident has been resolved.
Feb 7, 04:24 PM
monitoringA fix has been implemented and we are seeing private networking / certificates in SYD improving. We are continuing to monitor for full recovery.
Feb 7, 03:46 PM
investigatingPrivate Networking (6PN) is degraded in SYD region. Communication between Machines in SYD region and Machines in other regions may fail at this time. Newly created Machines in SYD may fail to sync to ...
+2 more updates

Network issues on newly-created machines

minor
Feb 5, 09:52 PMFeb 6, 07:07 AMresolved
Feb 6, 07:07 AM
resolvedThis issue is now resolved.
Feb 6, 02:49 AM
monitoringWe have successfully run a fix to re-sync our global and regional state stores in order to bring machines back to a healthy state, and we're monitoring the situation to confirm that there are no more ...
Feb 5, 09:52 PM
identifiedMachines created after the delayed machine registration incident (https://status.flyio.net/incidents/3npj6935byt4) may have incomplete networking configurations and could be unable to receive traffic....

MPG Degraded clusters in AMS, IAD and SIN regions

major
Feb 5, 05:03 PMFeb 6, 04:58 AMresolved
Feb 6, 04:58 AM
resolvedAll MPG clusters are back to full, normal operations.
Feb 6, 04:22 AM
monitoringAll MPG clusters are reachable.
Feb 6, 03:36 AM
identifiedWe are still continuing cleanup on some clusters.
+5 more updates

Delayed Machine Registration + Token Errors

major
Feb 5, 04:43 PMFeb 5, 08:53 PMresolved
Feb 5, 08:53 PM
resolvedThis incident has been resolved.
Feb 5, 08:11 PM
monitoringA fix has been deployed across all impacted hosts. We are seeing a sharp reduction in Token errors since 20:00 UTC and other metrics are recovering as well. We are continuing to monitor closely
Feb 5, 07:03 PM
identifiedWe saw some improvement from the previous fix, however errors remained elevated on some hosts. We have identified the root cause of the remaining errors as a communication issue between the hosts and...
+4 more updates

Network maintenance in YYZ

minor
Feb 5, 05:46 AMFeb 5, 09:22 AMresolved
Feb 5, 09:22 AM
resolvedNetwork maintenance has concluded.
Feb 5, 09:01 AM
monitoringManaged Postgres clusters in YYZ should be operating normally.
Feb 5, 05:46 AM
identifiedAn upstream network provider is performing an emergency network maintenance in the YYZ region. Machines in YYZ may see some packet loss. Managed Postgres clusters in YYZ are experiencing management p...

IPv6 Issues in YYZ

major
Feb 3, 03:33 PMFeb 3, 03:53 PMresolved
Feb 3, 03:53 PM
resolvedThis incident has been resolved.
Feb 3, 03:44 PM
monitoringA fix has been implemented and we're seeing IPv6 networking return to normal in YYZ. We'll continue to monitor to ensure full recovery.
Feb 3, 03:33 PM
investigatingWe are currently investigating degraded IPv6 networking in the YYZ (Toronto) region. Users with machines in this region may see issues connecting to their machines over IPv6. Users with static egre...

Elevated latency and packetloss in North American regions

minor
Feb 3, 02:56 AMFeb 3, 03:43 AMresolved
Feb 3, 03:43 AM
resolvedThis incident has been resolved.
Feb 3, 03:26 AM
monitoringNetwork performance issues between North American regions have resolved and we're continuing to monitor.
Feb 3, 02:56 AM
investigatingWe are currently investigating intermittent spikes of increased latency and packet loss between North American regions over the past hour. Users may see degraded network performance on traffic in and ...

Congestion in CDG and FRA

minor
Feb 1, 08:15 PMFeb 1, 09:37 PMresolved
Feb 1, 09:37 PM
resolvedThis incident has been resolved.
Feb 1, 08:15 PM
investigatingWe are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).

Sprites are returning not found or unauthorized when they shouldn't be.

none
Feb 1, 02:16 AMFeb 1, 05:48 AMresolved
Feb 1, 05:48 AM
resolvedThis incident has been resolved.
Feb 1, 05:32 AM
monitoringWe've been able to restore missing sprites and tokens. We're monitoring for any additional issues.
Feb 1, 04:52 AM
identifiedWe're working on a fix to restore missing sprites and tokens.
+3 more updates

January 2026

Grafana Log Search Display Issue

none
Jan 31, 05:35 PMJan 31, 06:29 PMresolved
Jan 31, 06:29 PM
resolvedThis has been resolved. If you are still experiencing any issues, you may need to log out and then back in.
Jan 31, 05:49 PM
investigatingNo logs are displayed in Grafana Log Search when using the default `*` query. You can try the following workarounds: 1. Replace the default `*`query with `NOT ""` 2. Viewing logs from the “fly app” ...
Jan 31, 05:35 PM
investigatingNo logs are displayed in Grafana Log Search when using the default `*` query. As a temporary workaround, please replace `*` with `NOT ""` query. Thank you for your kind understanding as we work throu...

Delayed metric reporting in NRT and SIN regions

minor
Jan 27, 07:40 PMJan 29, 08:04 PMresolved
Jan 29, 08:04 PM
resolvedThis incident has been resolved. All hosts in SIN and NRT are reporting up to date metrics.
Jan 29, 03:15 PM
identifiedCurrently one host in SIN is still finishing working through it's metrics backlog and is reporting delayed metrics. Other hosts in NRT and SIN are reporting metrics correctly. If needed, users with i...
Jan 28, 03:27 PM
identifiedMost hosts in NRT and SIN have completed backfilling their metrics and are up to date in fly-metrics.net. Four hosts are still working through the backlog; machines on those hosts are still reporting...
+2 more updates

Congestion in CDG and FRA

minor
Jan 24, 04:00 PMJan 24, 04:00 PMresolved
Jan 24, 11:06 PM
resolvedWe are experiencing elevated weekend congestion in CDG (France) and FRA (Germany).

Delays issuing certificates

minor
Jan 21, 03:21 AMJan 21, 05:24 AMresolved
Jan 21, 05:24 AM
resolvedThis incident has been resolved.
Jan 21, 04:59 AM
monitoringWe have identified the congestion and released a fix, we'll continue to monitor while the jobs catch up.
Jan 21, 03:21 AM
investigatingWe are currently investigating possible delays issuing ACME certificates for new hostnames.

Errors creating new Sprites

minor
Jan 20, 10:08 PMJan 20, 10:15 PMresolved
Jan 20, 10:15 PM
resolvedThis incident has been resolved.
Jan 20, 10:10 PM
investigatingWe are continuing to investigate this issue.
Jan 20, 10:08 PM
investigatingWe're currently investigating an issue that's preventing new Sprites from being created.

MPG network instability in LAX

none
Jan 19, 02:34 PMJan 19, 08:07 PMresolved
Jan 19, 08:07 PM
resolvedThis incident has been resolved.
Jan 19, 03:11 PM
monitoringConnections are back to normal. We'll keep monitoring the region.
Jan 19, 02:34 PM
investigatingWe identified network partitions in the LAX region. We are investigating the problem.

Machines errors in JNB region

major
Jan 19, 12:17 PMJan 19, 12:29 PMresolved
Jan 19, 12:29 PM
resolvedThis incident has been resolved.
Jan 19, 12:17 PM
identifiedA bad deploy of an internal service in JNB region may cause Machines API requests for JNB region machines to fail. At this time, it may not be possible to create or update machines in JNB region, but ...

Get Fly.io Outage Alerts

Be the first to know when Fly.io go down.