How often does Fly.io have outages?

Fly.io has had 50 reported incidents recently. Visit apistatuscheck.com/incidents/flyio for the complete outage history with timelines and resolution details.

What was the last Fly.io outage?

The most recent Fly.io incident was "Errors Setting/Updating Secrets" on May 5, 2026. It was classified as major impact and is currently resolved.

Fly.io Outage History

50 incidents reported. Data sourced from the official Fly.io status page.

Total Incidents

Major/Critical

Minor

Resolved

May 2026

Errors Setting/Updating Secrets

major

May 5, 01:18 PM→May 5, 01:53 PMresolved

May 5, 01:53 PM

resolved — This incident has been resolved.

May 5, 01:22 PM

investigating — Creation of new apps or changing secrets on existing apps fails

May 5, 01:22 PM

investigating — Creation of new apps or changing secrets on existing apps fails

+2 more updates

Log search unavailable

minor

May 4, 06:57 PM→May 4, 08:42 PMresolved

May 4, 08:42 PM

resolved — This incident has been resolved.

May 4, 08:00 PM

monitoring — We have a mitigation in place and are monitoring results.

May 4, 06:57 PM

investigating — Log search in Grafana is currently unavailable. You may see `failed to make http request: 502` errors when accessing logs from fly-metrics.net at this time. App logs continue to be available using the...

April 2026

flyctl deploy creating new app instances

minor

Apr 28, 11:50 PM→Apr 29, 12:40 AMresolved

Apr 29, 12:40 AM

resolved — This incident has been resolved.

Apr 29, 12:31 AM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 29, 12:07 AM

identified — The issue has been identified and a fix is being implemented.

+1 more updates

Slow machines operations in IAD region

minor

Apr 24, 10:45 PM→Apr 24, 11:31 PMresolved

Apr 24, 11:31 PM

resolved — This incident has been resolved.

Apr 24, 11:19 PM

monitoring — Network packet loss has returned to normal levels. We are monitoring the Machines API for stability.

Apr 24, 11:18 PM

investigating — We are continuing to investigate this issue.

+2 more updates

Errors when adding or editing Github integrations for deployments

minor

Apr 23, 03:05 PM→Apr 23, 04:26 PMresolved

Apr 23, 04:26 PM

resolved — This incident has been resolved.

Apr 23, 03:39 PM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 23, 03:22 PM

identified — We are continuing to work on a fix for this issue.

+2 more updates

Errors (5xx, timeouts) in Fly.io dashboard

major

Apr 23, 11:17 AM→Apr 23, 11:50 AMresolved

Apr 23, 11:50 AM

resolved — This incident has been resolved.

Apr 23, 11:45 AM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 23, 11:35 AM

identified — The issue has been identified and a fix is being implemented.

+1 more updates

Increased latency in SIN

minor

Apr 20, 02:29 PM→Apr 20, 05:38 PMresolved

Apr 20, 05:38 PM

resolved — This incident has been resolved.

Apr 20, 03:29 PM

identified — We are currently working on resolving increased latencies in our Singapore region.

TLS certificate issues

major

Apr 17, 01:06 PM→Apr 18, 08:42 PMresolved

Apr 18, 08:42 PM

resolved — This incident has been resolved.

Apr 17, 03:34 PM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 17, 01:06 PM

investigating — We are investigating an issue with the Vault server that stores TLS certificates. Provisioning new TLS certificates may fail, and connecting to domains whose existing certificate has not yet been cach...

Network issues in SYD

none

Apr 15, 11:08 AM→Apr 16, 10:59 AMresolved

Apr 16, 10:59 AM

resolved — This incident has been resolved.

Apr 15, 11:40 AM

monitoring — We've identified the issue and applied a fix. All services should be working as normal.

Apr 15, 11:08 AM

investigating — We're currently investigating some networking issues in SYD. This is affecting a number of our central services.

Heightened latency in ORD

none

Apr 12, 06:50 PM→Apr 12, 11:03 PMresolved

Apr 12, 11:03 PM

resolved — This incident has been resolved.

Apr 12, 07:26 PM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 12, 06:50 PM

investigating — We are currently investigating heightened network latency in ORD.

Managed Postgres control plane instability in NRT (Tokyo)

minor

Apr 10, 06:42 PM→Apr 10, 09:48 PMresolved

Apr 10, 09:48 PM

resolved — This incident has been resolved.

Apr 10, 08:32 PM

monitoring — A fix has been implemented and we are seeing MPG performance in NRT normalize. We are continuing to monitor to ensure a stable recovery

Apr 10, 08:13 PM

identified — The issue has been identified and a fix is being implemented. Users with clusters in NRT may continue to see instability at this time

+1 more updates

Unavailable hosts in ORD region

major

Apr 9, 07:29 PM→Apr 9, 08:14 PMresolved

Apr 9, 08:14 PM

resolved — This incident has been resolved.

Apr 9, 07:29 PM

investigating — Some hosts in our Chicago (ORD) region are currently inaccessible. We are working with our provider to resolve this issue. To see if you are affected, please visit the personalized status page: https:...

Managed Postgres Control Plane Issues in SYD

major

Apr 9, 03:50 AM→Apr 9, 05:30 AMresolved

Apr 9, 05:30 AM

resolved — This incident has been resolved.

Apr 9, 05:20 AM

monitoring — Control plane operations in SYD have returned to normal and all clusters are healthy at this time. We're continuing to monitor to ensure stable recovery.

Apr 9, 04:12 AM

identified — We are seeing an improvement in control plane performance in the SYD region. Some clusters in the region currently are showing degraded standby nodes and we are working to bring those back to full hea...

+1 more updates

Metrics currently experiencing issues

major

Apr 8, 08:34 AM→Apr 8, 12:23 PMresolved

Apr 8, 12:23 PM

resolved — This incident has been resolved.

Apr 8, 11:02 AM

monitoring — We are continuing to monitor for any further issues.

Apr 8, 11:00 AM

monitoring — We have implemented a fix. We're monitoring the cluster for further issues.

+1 more updates

GraphQL API / Dashboard Issues

critical

Apr 7, 03:08 PM→Apr 7, 06:17 PMresolved

Apr 7, 06:17 PM

resolved — This incident has been resolved.

Apr 7, 03:39 PM

monitoring — A fix has been implemented and we are monitoring the results.

Apr 7, 03:17 PM

identified — We have restored GraphQL and dashboard availability, but some actions (e.g. app state updates) may still be delayed.

+1 more updates

March 2026

Low Capacity in SIN and AMS regions

none

Mar 29, 03:00 PM→Mar 29, 04:01 PMresolved

Mar 29, 04:01 PM

resolved — This incident has been resolved.

Mar 29, 03:35 PM

monitoring — We've freed up additional room in the SIN and AMS regions and are monitoring capacity.

Mar 29, 03:33 PM

monitoring — We've freed up additional room in the SIN and AMS regions and are monitoring capacity.

+3 more updates

Low capacity in IAD

minor

Mar 27, 06:08 PM→Mar 27, 09:51 PMresolved

Mar 27, 09:51 PM

resolved — This incident has been resolved.

Mar 27, 09:09 PM

monitoring — With the additional capacity we've brought online, machine start failure rates in IAD have now recovered. We'll continue to monitor IAD capacity.

Mar 27, 07:21 PM

identified — We've brought some additional capacity online in IAD and are seeing improvements, and we're continuing to work on adding more and freeing up additional room.

+2 more updates

Machine Creates Failing in ORD Region

major

Mar 26, 03:21 PM→Mar 26, 05:54 PMresolved

Mar 26, 05:54 PM

resolved — This incident has been resolved.

Mar 26, 05:28 PM

monitoring — We've implemented a fix and have seen error rates for machine creates in ORD drop off. We're continuing to monitor the results.

Mar 26, 04:50 PM

identified — We've identified the cause of this increased failure rate and a fix is in progress. We are seeing most creates in ORD succeed at this time, though failure rate is still above baseline.

+2 more updates

Network issues in FRA region

critical

Mar 26, 12:37 PM→Mar 26, 02:19 PMresolved

Mar 26, 02:19 PM

resolved — This incident has been resolved.

Mar 26, 01:16 PM

identified — Some Managed Postgres clusters in FRA region are still unreachable, we are investigating this issue.

Mar 26, 01:14 PM

monitoring — Apps and Managed Postgres clusters in FRA region should be back online at this time. We are monitoring for any further issues.

+1 more updates

Backend errors when trying to use Grafana to view logs

none

Mar 23, 03:18 PM→Mar 23, 04:27 PMresolved

Mar 23, 04:27 PM

resolved — This incident is resolved, Grafana logs are now working properly.

Mar 23, 03:55 PM

monitoring — We've deployed a fix and are monitoring the results. Logs are now be visible on Grafana.

Mar 23, 03:41 PM

identified — Using the Logs panel in Grafana at https://fly-metrics.net/ will show a 502 error from the backend and won't show any logs. You can use `fly logs` or the live log viewer directly on https://fly.io/das...

+1 more updates

Machines failing to start in DFW

minor

Mar 20, 07:26 AM→Mar 23, 01:19 PMresolved

Mar 23, 01:19 PM

resolved — This incident has been resolved.

Mar 21, 08:26 AM

monitoring — Machine start success rates in DFW have improved but we are continuing to monitor and make further adjustments. We will provide updates as the situation progresses.

Mar 20, 12:45 PM

monitoring — In addition to freeing up existing capacity, the team has provisioned new capacity in DFW and we are monitoring the results.

+2 more updates

Metrics currently experiencing issues

critical

Mar 19, 06:28 AM→Mar 19, 10:37 AMresolved

Mar 19, 10:37 AM

resolved — This incident has been resolved. We're unable to recover the lost metrics from that one hour.

Mar 19, 07:12 AM

monitoring — We have implemented a fix. There has been approximately 1h of lost metrics from 06:07UTC. We're monitoring the cluster for further issues

Mar 19, 06:28 AM

investigating — We are currently investigating an issue with our metrics cluster.

Machines failing to start in DFW

major

Mar 18, 09:58 AM→Mar 18, 06:53 PMresolved

Mar 18, 06:53 PM

resolved — This incident has been resolved. Machine creates in DFW continue to work normally.

Mar 18, 12:40 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 18, 11:44 AM

identified — The team is currently rolling out additional capacity in DFW which should help ease Machine start failures across the region.

+1 more updates

IPv6 networking issues in SJC region

major

Mar 18, 04:12 PM→Mar 18, 05:02 PMresolved

Mar 18, 05:02 PM

resolved — This incident has been resolved.

Mar 18, 04:31 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 18, 04:12 PM

investigating — We are investigating intermittent network issues in SJC region impacting outbound public IPv6 access from Machines. Connecting to IPv6 internet resources from apps hosted in SJC region may be slow or ...

Connection Issues in SJC

minor

Mar 18, 02:07 PM→Mar 18, 02:18 PMresolved

Mar 18, 02:18 PM

resolved — This incident has been resolved.

Mar 18, 02:07 PM

monitoring — Between 13:55 and 14:03 UTC machines and MPG clusters hosted in the SJC region saw elevated connection errors. Users may have seen errors connecting to or from most machines in the region, as well as ...

Fly ssh console command failing

minor

Mar 18, 02:12 PM→Mar 18, 02:18 PMresolved

Mar 18, 02:18 PM

resolved — This incident has been resolved.

Mar 18, 02:17 PM

monitoring — A fix has been implemented and we are seeing `ssh console` commands succeed as normal.

Mar 18, 02:12 PM

identified — We have identified an issue causing new `fly ssh console` connections to fail with 500 errors. A fix is in progress.

Sprites Operations: 401 errors for certain organizations

none

Mar 14, 04:20 AM→Mar 14, 02:05 PMresolved

Mar 14, 02:05 PM

resolved — This incident has been resolved.

Mar 14, 01:55 PM

monitoring — Organizations with names prefixed with numerical digits may experience 401 errors. Affected operations include actions such as Sprite creation, listing, etc... A fix has been implemented since 2026-0...

Setting secrets and creating apps is degraded

major

Mar 11, 09:19 AM→Mar 11, 11:37 AMresolved

Mar 11, 11:37 AM

resolved — This incident has been resolved.

Mar 11, 11:03 AM

monitoring — While the secret storage service was in a read-only state, app creation requests queued up, due to the retry logic and insufficient request concurrency limits in our GraphQL API. This prevented our Gr...

Mar 11, 10:14 AM

monitoring — A fix has been implemented and we are monitoring the results.

+1 more updates

Private networking issues in SYD region

major

Mar 7, 02:42 PM→Mar 7, 03:56 PMresolved

Mar 7, 03:56 PM

resolved — This incident has been resolved.

Mar 7, 03:10 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 7, 02:42 PM

investigating — We are investigating a private networking failure between SYD and other regions. Apps continue to run, and private networking within SYD is unaffected.

Routing issues in NA regions

none

Mar 5, 07:24 PM→Mar 5, 07:50 PMresolved

Mar 5, 07:50 PM

resolved — This incident has been resolved. Due to a BGP issue, we saw some North American traffic routed to edges in Singapore (sin). Users in North America would have seen additional request latency during thi...

Mar 5, 07:38 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 5, 07:24 PM

investigating — We're aware of routing issues affecting some customers in North America regions, and we're actively investigating.

Elevated GraphQL API errors

major

Mar 3, 08:18 PM→Mar 3, 09:15 PMresolved

Mar 3, 09:15 PM

resolved — This incident was caused by a failed Redis node that powers our GraphQL API. We were able to recreate the Redis node and restore service. We are still investigating the root cause of the failure. In ...

Mar 3, 08:36 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 3, 08:18 PM

investigating — We're investigating elevated GraphQL errors that affect some API endpoints.

Cost Explorer fails to load

minor

Mar 3, 10:50 AM→Mar 3, 12:10 PMresolved

Mar 3, 12:10 PM

resolved — This incident has been resolved.

Mar 3, 10:50 AM

investigating — We are currently investigating this issue. The page currently displays: "We’re having trouble loading the cost breakdown."

Certificates issues affecting API and proxy

none

Mar 3, 12:54 AM→Mar 3, 12:54 AMresolved

Mar 3, 02:05 AM

resolved — Between 19:54 and 20:06 UTC, our Vault cluster serving app certificates was unavailable. This caused various API requests to fail, mainly operations on certificates but also app creates and IP assignm...

Machines failing to boot in EWR

major

Mar 2, 05:42 PM→Mar 2, 10:49 PMresolved

Mar 2, 10:49 PM

resolved — This incident has been resolved.

Mar 2, 08:35 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 2, 06:21 PM

identified — The issue has been identified and a fix is being implemented.

+1 more updates

Issues with the Machines API

minor

Mar 2, 09:19 PM→Mar 2, 09:50 PMresolved

Mar 2, 09:50 PM

resolved — This incident has been resolved.

Mar 2, 09:47 PM

monitoring — A fix has been implemented and we are monitoring the results.

Mar 2, 09:39 PM

identified — The issue has been identified and a fix is being implemented.

+1 more updates

February 2026

Slow API requests

major

Feb 27, 06:50 PM→Feb 27, 08:21 PMresolved

Feb 27, 08:21 PM

resolved — This incident has been resolved. All platform and API operations are working normally.

Feb 27, 08:05 PM

monitoring — API and platform operations have normalized. We are continuing to monitor to ensure full and stable recovery. Background jobs are almost fully caught up. Users may still see slightly slower requests...

Feb 27, 07:41 PM

identified — A second fix has been deployed and database load has returned to normal, resulting in API response times beginning to normalize. Most Machines API requests should succeed as normal, and deploys to exi...

+6 more updates

Capacity issues in iad and dfw

minor

Feb 27, 03:34 PM→Feb 27, 05:54 PMresolved

Feb 27, 05:54 PM

resolved — This incident has been resolved.

Feb 27, 05:31 PM

monitoring — We have provisioned additional capacity in dfw and iad and are monitoring to ensure machine and builder starts are succeeding consistently.

Feb 27, 03:34 PM

identified — These regions (Dallas, TX dfw and Ashburn, VA iad) are currently low on capacity. New machine creates in these regions might fail temporarily, and Depot builders may be unavailable, causing deploys to...

Capacity isssues in iad and dfw

none

Feb 26, 05:00 PM→Feb 26, 10:28 PMresolved

Feb 26, 10:28 PM

resolved — This incident has been resolved.

Feb 26, 08:19 PM

monitoring — We're continuing to monitor after having added more capacity to our DFW and IAD regions. Deploys or machine starts using existing volumes in these regions may still hit a capacity issue. Users shoul...

Feb 26, 06:57 PM

identified — We have added additional capacity in DFW and IAD regions and are monitoring the impact. New machine creates and deploys without volumes are seeing improved success rates. Deploys using depot builde...

+3 more updates

Sprites API degradation

none

Feb 24, 05:23 PM→Feb 24, 05:51 PMresolved

Feb 24, 05:51 PM

resolved — This incident has been resolved.

Feb 24, 05:24 PM

identified — A slow deploy is causing Sprites API degradation. We are implementing a fix.

Feb 24, 05:23 PM

identified — A slow deploy is causing Sprites API degradation. We are implementing a fix.

Metrics are degraded

minor

Feb 24, 04:33 AM→Feb 24, 11:06 AMresolved

Feb 24, 11:06 AM

resolved — Metrics processing has caught up, and we don't see any data loss.

Feb 24, 09:35 AM

monitoring — Delayed metrics are still being processed.

Feb 24, 06:46 AM

monitoring — Metrics are coming back online, but it will take a little time to process what's backed up in the queues.

+2 more updates

Sprite creations failing

minor

Feb 24, 09:39 AM→Feb 24, 10:44 AMresolved

Feb 24, 10:44 AM

resolved — This incident has been resolved.

Feb 24, 10:25 AM

monitoring — A fix has been implemented and we are monitoring the results.

Feb 24, 09:39 AM

investigating — We are currently investigating issues creating new Sprites.

Degraded Managed Postgres Control Plane

none

Feb 23, 03:00 PM→Feb 23, 08:30 PMresolved

Feb 24, 12:31 AM

resolved — This incident has been resolved as of 20:30 UTC.

Feb 23, 03:00 PM

investigating — We are currently investigating issues with the MPG control plane. Users may experience delays or hanging when creating or deleting databases via the dashboard or CLI.

Deploys hanging at waiting for Depot Builder

minor

Feb 20, 04:14 PM→Feb 20, 08:49 PMresolved

Feb 20, 08:49 PM

resolved — This incident has been resolved.

Feb 20, 07:38 PM

monitoring — The fix has been rolled out and we are seeing deploys using depot builder succeeding normally. We continue to monitor to ensure full recovery. Depot builders have been reenabled as the default optio...

Feb 20, 05:59 PM

identified — A fix is being rolled out. Fly builders continue to be the default while this is deployed

+2 more updates

Networking issues for users connecting through lhr

minor

Feb 20, 10:52 AM→Feb 20, 11:57 AMresolved

Feb 20, 11:57 AM

resolved — Network traffic in LHR has been stable for some time now, we are not seeing any further issues.

Feb 20, 11:21 AM

monitoring — A fix has been implemented and we are monitoring the results.

Feb 20, 10:52 AM

investigating — We’re currently investigating this issue.

Investigating registry issues affecting deploys

minor

Feb 19, 09:14 PM→Feb 20, 12:05 AMresolved

Feb 20, 12:05 AM

resolved — This incident has been resolved.

Feb 19, 10:24 PM

identified — While we have seen some improvement from the previous fix, we are still seeing elevated rates of Registry connection issues. Users may continue to see slower machine creates and deploys due to slow im...

Feb 19, 09:49 PM

monitoring — A fix has been implemented and we are monitoring the results.

+2 more updates

Control plane state delayed on some hosts possibly causing network or deployment disruption

major

Feb 18, 04:22 PM→Feb 18, 04:44 PMresolved

Feb 18, 04:44 PM

resolved — This incident has been resolved.

Feb 18, 04:28 PM

monitoring — A fix has been implemented and we are monitoring the results.

Feb 18, 04:23 PM

identified — We are continuing to work on a fix for this issue.

+1 more updates

flyctl deploy timeouts

major

Feb 17, 01:06 PM→Feb 17, 02:24 PMresolved

Feb 17, 02:24 PM

resolved — Earlier today, an issue caused elevated rate limiting and some deployment timeouts. A fix is in place and deployments are back to normal.

Feb 17, 01:42 PM

monitoring — A fix has been implemented and we are monitoring the results.

Feb 17, 01:06 PM

identified — We’re investigating elevated 429 errors from flaps causing deployment timeouts. Affected deploys are failing with: ✖ Failed: error waiting for release_command machine XX to finish running: timeout rea...

Degraded Managed Postgres Control Plane in ORD

major

Feb 14, 11:33 AM→Feb 14, 02:27 PMresolved

Feb 14, 02:27 PM

resolved — This incident has been resolved.

Feb 14, 02:07 PM

monitoring — A fix has been implemented and we are seeing full recovery of the control plane in ORD. With that recovery we are seeing impacted replicas catching up and clusters returning to normal health. We're co...

Feb 14, 01:47 PM

identified — We are continuing to work on a fix for this issue.

+2 more updates