GitHub Outage History
Past incidents and downtime events
Complete history of GitHub outages, incidents, and service disruptions. Showing 50 most recent incidents.
April 2026(9 incidents)
Problems with third-party Claude and Codex Agent sessions not being listed in the agents tab dashboard
3 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
We are investigating third party Claude and Codex Cloud Agent sessions not being listed in the agents tab dashboard.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
7 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
We continue to investigate periodic delays in Copilot Cloud Agent job processing
We are continuing to investigate Copilot Cloud Agent job delays
Copilot Cloud Agent jobs are being processed and we are monitoring recovery
We are investigating delays processing Copilot Cloud Agent jobs
We are experiencing issues where jobs are being delayed to start for copilot coding agent
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
4 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
The degradation has been mitigated. We are monitoring to ensure stability.
We are investigating an issue affecting GitHub Copilot coding agent. Users may experience significant delays when starting new agent sessions, with jobs remaining queued longer than expected. Our team has identified increased load as a contributing factor and is actively working to restore normal performance.
We are investigating reports of impacted performance for some GitHub services.
Disruption with GitHub notifications
3 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
The degradation has been mitigated. We are monitoring to ensure stability.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
7 updates
Between 15:20 and 20:18 UTC on Thursday April 2, Copilot Cloud Agent entered a period of reduced performance. Due to an internal feature being developed for Copilot Code Review, the Copilot Cloud Agent infrastructure started to receive an increased number of jobs. This load eventually caused us to hit an internal rate limit, causing all work to suspend for an hour. During this hour, some new jobs would time out, while others would resume once rate limiting ended. Roughly 40% of jobs in this period were affected.Once the cause of this rate limiting was identified, we were able to disable the new CCR feature via a feature flag. Once the jobs that were already in the queue were able to clear, we didn't see additional instances of rate limiting afterwards.
The degradation has been mitigated. We are monitoring to ensure stability.
Although we are observing recovery once again, we expect continued periods of degradation. Work that is queued during times of degradation does eventually get processed. We continue to investigate and find a mitigation, and will update again within 2 hours.
This issue has recurred. Customers will once again experience false job starts when assigning tasks to Copilot Cloud Agent. We are still investigating and trying to understand the pattern of degradation.
We are once again seeing recovery with Copilot Cloud Agent job starts. We are keeping this open while we verify this won't recur.
When assigning tasks to Copilot Cloud Agent, the task will appear to be working, but may not actually be running.We are investigating.
We are investigating reports of impacted performance for some GitHub services.
Copilot Coding Agent failing to start some jobs
3 updates
Between 15:20 and 20:18 UTC on Thursday April 2, Copilot Cloud Agent entered a period of reduced performance. Due to an internal feature being developed for Copilot Code Review, the Copilot Cloud Agent infrastructure started to receive an increased number of jobs. This load eventually caused us to hit an internal rate limit, causing all work to suspend for an hour. During this hour, some new jobs would time out, while others would resume once rate limiting ended. Roughly 40% of jobs in this period were affected.Once the cause of this rate limiting was identified, we were able to disable the new CCR feature via a feature flag. Once the jobs that were already in the queue were able to clear, we didn't see additional instances of rate limiting afterwards.This was the same incident declared in https://www.githubstatus.com/incidents/d96l71t3h63k
When assigning tasks to Copilot Cloud Agent, the task will appear to be working, but may not actually be running. We are investigating.
We are investigating reports of impacted performance for some GitHub services.
Disruption with GitHub's code search
7 updates
On April 1st, 2026 between 14:40 and 17:00 UTC the GitHub code search service had an outage which resulted in users being unable to perform searches.The issue was initially caused by an upgrade to the code search Kafka cluster ZooKeeper instances which caused a loss of quorum. This resulted in application-level data inconsistencies which required the index to be reset to a point in time before the loss of quorum occurred. Meanwhile, an accidental deploy resulted in query services losing their shard-to-host mappings, which are typically propagated by Kafka.We remediated the problem by performing rolling restarts in the Kafka cluster, allowing quorum to be reestablished. From there we were able to reset our index to a point in time before the inconsistencies occurred.The team is working on ways to improve our time to respond and mitigate issues relating to Kafka in the future.
Code search has recovered and is serving production traffic.
We have stabilized Code Search infrastructure, and are in the final stages of validation before slowly reintroducing production traffic.
We are still working on recovering back to a serviceable state and expect to have a more substantial update within another two hours.
We are observing some recovery for Code Search queries, but customers should be aware that the data being served may be stale, especially for changes that took place after 07:00 UTC today (1 April 2026). We are still working on recovering our ingestion pipeline, and synchronizing the indexed data.We will update again within 2 hours.
We identified an issue in our ingestion pipeline that degraded the freshness of Code Search results. While fixing the issue with the ingestion pipeline, a deployment caused a loss of dynamic configuration which is causing most requests for Code Search results to fail. We are working to restore the service and to re-ingest the misaligned data.
We are investigating reports of impacted performance for some GitHub services.
GitHub audit logs are unavailable
3 updates
On April 1, 2026, between 15:34 UTC and 16:02 UTC, our audit log service lost connectivity to its backing data store due to a failed credential rotation. During this 28-minute window, audit log history was unavailable via both the API and web UI. This resulted in 5xx errors for 4,297 API actors and 127 github.com users. Additionally, events created during this window were delayed by up to 29 minutes in github.com and event streaming. No audit log events were lost; all audit log events were ultimately written and streamed successfully. Customers using GitHub Enterprise Cloud with data residency were not impacted by this incident. We were alerted to the infrastructure failure at 15:40 UTC — six minutes after onset — and resolved the issue by recycling the affected environment, restoring full service by 16:02 UTC. We are conducting a thorough review of our credential rotation process to strengthen its resiliency and prevent recurrence. In parallel, we are strengthening our monitoring capabilities to ensure faster detection and earlier visibility into similar issues going forward.
A routine credential rotation has failed for our our audit logs service; we have re-deployed our service and are waiting for recovery.
We are investigating reports of impacted performance for some GitHub services.
Incident with Copilot
9 updates
On April 1, 2026, between 07:29 and 12:41 UTC, some customers experienced elevated 5xx errors and increased latency when using GitHub Copilot features that rely on `/agents/sessions` endpoints (including creating or viewing agent sessions). The issue was caused by resource exhaustion in one of the Copilot backend services handling these requests, in turn, causing timeouts and failed requests. We mitigated the incident by increasing the service’s available compute resources and tuning its runtime concurrency settings. Service health returned to normal and the incident was fully resolved by 12:41 UTC.
The success rate and latency for creating and viewing agent sessions has stabilized at baseline levels, we are continuing to monitor recovery
The degradation has been mitigated. We are monitoring to ensure stability.
The success rate for creating and viewing agent sessions has stabilized, and we're continuing to monitor latency, which is trending toward baseline levels.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Copilot has been mitigated. We are monitoring to ensure stability.
Users may see increased latency and intermittent errors when viewing or creating agent sessions. We are working on mitigations to return to baseline performance and success rate.
We are investigating reports of issues with service(s): Copilot Dotcom Agents. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of degraded performance for Copilot
March 2026(32 incidents)
Incident with Pull Requests: High percentage of 500s
11 updates
On Monday March 31st, 2026, between 13:53 UTC and 21:23 UTC the Pull Requests service experienced elevated latency and failures. On average, the error rate was 0.15% and peaked at 0.28% of requests to the service. This was due to a change in garbage collection (GC) settings for a Go-based internal service that provides access to Git repository data. The changes caused more frequent GC activity and elevated CPU consumption on a subset of storage nodes, increasing latency and failure rates for some internal API operations.We mitigated the incident by reverting the GC changes. To prevent future incidents and improve time to detection and mitigation, we are instrumenting additional metrics and alerting for GC-related behavior, improving our visibility into other signals that could cause degraded impact of this type, and updating our best practices and standards for garbage collection in Go-based services.
The degradation affecting Pull Requests has been mitigated. We are monitoring to ensure stability.
We continue to see a small subset of repositories experiencing timeouts and elevated latency in Pull Requests, affecting under 1% of requests.
Error rates remain elevated across multiple pull request endpoints. We are pursuing multiple potential mitigations.
We continue to experience elevated error rates affecting Pull Requests. An earlier fix resolved one component of the issue, but some users may still encounter intermittent timeouts when viewing or interacting with pull requests. Our teams are actively investigating the remaining causes.
We identified an issue causing increased errors when accessing Pull Requests. The mitigation is being applied across our infrastructure and we will continue to provide updates as the mitigation rolls out.
We are seeing recovery in latency and timeouts of requests related to pull requests, even though 500s are still elevated. While we are continuing to investigate, we are applying a mitigation and expect further recovery after it is applied.
We are continuing to investigate increased 500 errors affecting GitHub services. You may experience intermittent failures when using Pull Requests and other features. We are actively working to identify and resolve the underlying cause.
We are investigating increased 500 errors affecting GitHub services. You may experience intermittent failures when using Pull Requests and other features. We are actively working to identify and resolve the underlying cause.
We are seeing a higher than average number of 500s due to timeouts across GitHub services. We have a potential mitigation in flight and are continuing to investigate.
We are investigating reports of degraded performance for Pull Requests
Issues with metered billing report generation
7 updates
On March 31, 2026, between 06:15 UTC and 15:30 UTC, the GitHub billing usage reports feature was degraded due to reduced server capacity. Customers requesting billing usage reports and loading the top usage by organization and repository on the billing overview and usage pages were impacted. The average error rate for usage report requests was 15%, peaking at 98% over an eight-minute window. For the billing pages, an average of 56% of requests failed to load the top usage cards. The root cause was an increase in billing usage report requests with large datasets, which exhausted the capacity of the nodes responsible for reporting data. There was no impact on billing charges. We mitigated the incident by adjusting our auto-scaling thresholds to better meet our capacity needs. We are working to improve our metrics to reduce time to detection and mitigation for similar issues in the future.
The degradation has been mitigated. We are monitoring to ensure stability.
We have applied mitigations to a data store related to billing reports, and are seeing partial recovery to billing report generation. We continue to monitor for full recovery.
We are seeing a high number of 500s due to timeouts across GitHub services. We are redeploying some of our core services and we expect that this allow us to recover.
We're continuing to see high failure rates on billing report generation, and are working on mitigations for a data store related to billing reports.
We're seeing issues related to metered billing reports, intermittently affecting metered usage graphs and reports on the billing page. We have identified an issue with a data store, and are working on mitigations.
We are investigating reports of impacted performance for some GitHub services.
Elevated delays in Actions workflow runs and Pull Request status updates
4 updates
On March 30, 2026, between 10:11 UTC and 13:25 UTC, GitHub Actions experienced degraded performance. During this time, approximately 2.65% of workflow jobs triggered by pull request events experienced start delays exceeding 5 minutes. The issue was caused by replication lag on an internal database cluster used by Actions, which triggered write throttling in our database protection layer and slowed job queue processing. The replication lag originated from planned maintenance to scale the internal database. Newly added database hosts triggered guardrails in the throttling layer, restricting write throughput. The incident was mitigated by excluding the new hosts from replication delay calculations. To prevent recurrence, we have updated our maintenance procedures to ensure new hosts are excluded from throttling assessments during scaling operations. Additionally, we are investing in automation to streamline this type of maintenance activity.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Actions and Pull Requests has been mitigated. We are monitoring to ensure stability.
We are investigating reports of degraded performance for Actions and Pull Requests
Incident with Copilot
1 update
On March 27, 2026, from 02:30 to 04:56 UTC, a misconfiguration in our rate limiting system caused users on Copilot Free, Student, Pro, and Pro+ plans to experience unexpected rate limit errors. The configuration that was incorrectly applied was intended solely for internal staff testing of rate-limiting experiences. Copilot Business and Copilot Enterprise accounts were not affected. During this period, affected users received error messages instructing them to retry after a certain time. Approximately 32% of active Free users, 35% of active Student users, 46% of active Pro users, and 66% of active Pro+ users were affected. After identifying the root cause, we reverted the change and restored the expected rate limits. We are reviewing our deployment and validation processes to help ensure configurations used for internal testing cannot be inadvertently applied to production environments.
Disruption with some GitHub services
6 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
We are investigating elevated error rates affecting multiple GitHub services including Actions, Issues, Pull Requests, Webhooks, Codespaces, and login functionality. Some users may have experienced errors when accessing these features. Most services are now showing signs of recovery. We'll post another update by 21:00 UTC.
Issues is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Actions
Teams Github Notifications App is down
5 updates
On March 24, 2026, between 15:57 UTC and 19:51 UTC, the Microsoft Teams Integration and Teams Copilot Integration services were degraded and unable to deliver GitHub event notifications to Microsoft Teams. On average, the error rate was 37.4% and peaked at 90.1% of requests to the service -- approximately 19% of all integration installs failed to receive GitHub-to-Teams notifications in this time period.This was due to an outage at one of our upstream dependencies, which caused HTTP 500 errors and connection resets for our Teams integration.We coordinated with the relevant service teams, and the issue was resolved at 19:51 UTC when the upstream incident was mitigated.We are working to update observability and runbooks to reduce time to mitigation for issues like this in the future.
We are experiencing degraded availability from Azure Teams APIs, which is impacting notifications from GitHub to Microsoft Teams. We are awaiting resolution from Azure.
We are experiencing degraded availability from Azure APIs, which is impacting notifications from GitHub to Microsoft Teams. We are working with Azure to resolve the issue.
We found an issue impacting notifications from GitHub to Microsoft Teams. We are working on mitigation and will keep users updated on progress towards mitigation.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
3 updates
On March 22, 2026, between 09:05 UTC and 10:02 UTC, users may have experienced intermittent errors and increased latency when performing Git http read operations. On average, the error rate was 3.84% and peaked at 15.55% of requests to the service. The issue was caused by elevated latency in an internal authentication service within one of our regional clusters. We mitigated the issue by redirecting traffic away from the affected cluster at 09:39 UTC, after which error rates returned to normal. The incident was fully resolved at 10:02 UTC. We are working to scale the authentication service and reduce our time to detection and mitigation of issues like this one in the future.
We are investigating intermittently high latency and errors from Git operations.
We are investigating reports of impacted performance for some GitHub services.
Disruption with Copilot Coding Agent Sessions
4 updates
On March 19, 2026, between 01:05 UTC and 02:52 UTC, and again on March 20, 2026, between 00:42 UTC and 01:58 UTC, the Copilot Coding Agent service was degraded and users were unable to start new Copilot Agent sessions or view existing ones. During the first incident, the average error rate was ~53% andpeaked at ~93% of requests to the service. During the second incident, the average error rate was ~99%% and peaked at ~100%% of requests with significant retry amplification. Both incidents were caused by the same underlying system authentication issue that prevented the service from connecting to itsbacking datastore.We mitigated each incident by rotating the affected credentials, which restored connectivity and returned error rates to normal. The mitigation time was 01:24. The second occurrence was due to an incomplete remediation of the first.We are implementing automated monitoring for credential lifecycle events and improving operational processes to reduce our time to detection and mitigation of issues like this one in the future.
We are rolling out our mitigation and are seeing recovery.
We are seeing widespread issues starting and viewing Copilot Agent sessions. We understand the cause and are working on remediation.
We are investigating reports of impacted performance for some GitHub services.
Git operations for users in the west coast are experiencing an increase in latency
9 updates
On March 19, 2026 between 16:10 UTC and 00:05 UTC (March 20), Git operations (clone, fetch, push) from the US west coast experienced elevated latency and degraded throughput. Users reported clone speeds dropping from typical speeds to under 1 MiB/s in extreme cases. The root cause was network transport link saturation at our Seattle edge site, where a fiber cut affecting our backbone transport resulted in saturation and packet loss. We had a planned scale-up in progress for the site that was accelerated to resolve the backbone capacity pressure. We also brought online additional edge capacity in a cloud region and redirected some users there. Current scale with the upgraded network capacity is sufficient to prevent reoccurrence, as we upgraded from 800Gbps to 3.2Tbps total capacity on this path. We will continue to monitor network health and respond to any further issues.
We have reached stability with git operations through our changes deployed today.
We are seeing early signs of improvement. We are working on one more small change to further improve traffic routing on the west coast.
We have completed the rollout of our new network path and are monitoring its impact.
We are beginning the rollout of our new network path. During this change, users will continue to see higher latency from the west coast. We will provide another update when the rollout is complete.
We are working to enable a new network path in the west coast to reduce load and will monitor the impact on latency for Git Operations
We are still seeing elevated latency for Git operations in the west coast and are continuing to investigate
We are redirecting traffic back to our Seattle region and customers should see a decrease in latency for Git operations
We are investigating reports of degraded performance for Git Operations
Issues with Copilot Coding Agent
5 updates
On March 19, 2026, between 01:05 UTC and 02:52 UTC, and again on March 20, 2026, between 00:42 UTC and 01:58 UTC, the Copilot Coding Agent service was degraded and users were unable to start new Copilot Agent sessions or view existing ones. During the first incident, the average error rate was ~53% and peaked at ~93% of requests to the service. During the second incident, the average error rate was ~99%% and peaked at ~100%% of requests with significant retry amplification. Both incidents were caused by the same underlying system authentication issue that prevented the service from connecting to its backing datastore. We mitigated each incident by rotating the affected credentials, which restored connectivity and returned error rates to normal. The mitigation time was 01:24. The second occurrence was due to an incomplete remediation of the first. We are implementing automated monitoring for credential lifecycle events and improving operational processes to reduce our time to detection and mitigation of issues like this one in the future.
Copilot is operating normally.
We are investigating reports that Copilot Coding Agent session logs are not available in the UI.
Copilot is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Disruption with Copilot Coding Agent sessions
4 updates
On March 19, 2026, between 01:05 UTC and 02:52 UTC, and again on March 20, 2026, between 00:42 UTC and 01:58 UTC, the Copilot Coding Agent service was degraded and users were unable to start new Copilot Agent sessions or view existing ones. During the first incident, the average error rate was ~53% and peaked at ~93% of requests to the service. During the second incident, the average error rate was ~99%% and peaked at ~100%% of requests with significant retry amplification. Both incidents were caused by the same underlying system authentication issue that prevented the service from connecting to its backing datastore. We mitigated each incident by rotating the affected credentials, which restored connectivity and returned error rates to normal. The mitigation time was 01:24. The second occurrence was due to an incomplete remediation of the first. We are implementing automated monitoring for credential lifecycle events and improving operational processes to reduce our time to detection and mitigation of issues like this one in the future.
We have rolled out our mitigation and are seeing recovery for Copilot Coding Agent sessions
We are seeing widespread issues starting and viewing Copilot Agent sessions. We have a hypothesis for the cause and are working on remediation.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
8 updates
On March 19, 2026 between 16:10 UTC and 00:05 UTC (March 20), Git operations (clone, fetch, push) from the US west coast experienced elevated latency and degraded throughput. Users reported clone speeds dropping from typical speeds to under 1 MiB/s in extreme cases. The root cause was network transport link saturation at our Seattle edge site, where a fiber cut affecting our backbone transport resulted in saturation and packet loss. We had a planned scale-up in progress for the site that was accelerated to resolve the backbone capacity pressure. We also brought online additional edge capacity in a cloud region and redirected some users there. Current scale with the upgraded network capacity is sufficient to prevent reoccurrence, as we upgraded from 800Gbps to 3.2Tbps total capacity on this path. We will continue to monitor network health and respond to any further issues.This was the same incident declared in https://www.githubstatus.com/incidents/xs6xtcv196g7
We are seeing recovery in git operations for customers on the West Coast of the US.
We continue to investigate the slow performance of Git Operations affecting the US West Coast.
We continue to investigate degraded performance for git operations from the US West Coast.
We are continuing to investigate degraded performance for git operations from the US West Coast.
We are experiencing increased latency when performing git operations, especially large pushes and pulls from customers on the west coast of the US. We are not seeing an increase in failures. We are continuing to investigate.
Git Operations is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Webhook delivery is delayed
3 updates
On March 18, 2026, between 18:18 UTC and 19:46 UTC all webhook deliveries experienced elevated latency. During this time, average delivery latency increased from a baseline of approximately 5 seconds to a peak of approximately 160 seconds. This was due to resource constraints in the webhook delivery pipeline, which caused queue backlog growth and increased delivery latency. We mitigated the incident by shifting traffic and adding capacity, after which webhook delivery latency returned to normal. We are working to improve capacity management and detection in the webhook delivery pipeline to help prevent similar issues in the future.
We are seeing recovery and are continuing to monitor the latency for webhook deliveries
We are investigating reports of degraded performance for Webhooks
Errors starting and connecting to Codespaces
4 updates
On 16 March 2026, between 14:16 UTC and 15:18 UTC, Codespaces users encountered a download failure error message when starting newly created or resumed codespaces. At peak, 96% of the created or resumed codespaces were impacted. Active codespaces with a running VSCode environment were not affected. The error was a result of an API deployment issue with our VS Code remote experience dependency and was resolved by rolling back that deployment. We are working with our partners to reduce our incident engagement time, improve early detection before they impact our customers, and ensure safe rollout of similar changes in the future.
Errors starting or resuming Codespaces have resolved.
We are investigating reports of users experiencing errors when starting or connecting to Codespaces. Some users may be unable to access their development environments during this time. We are working to identify the root cause and will implement a fix as soon as possible.
We are investigating reports of impacted performance for some GitHub services.
Degraded performance for various services
6 updates
On March 13, 2026, between 13:35 UTC and 16:02 UTC, a configuration change to an internal authorization service reduced its processing capacity below what was needed during peak traffic. This caused intermittent timeouts when other GitHub services checked user permissions, resulting in four to five waves of errors over roughly two hours and forty minutes. In total, 0.4% of users were denied access to actions they were authorized to perform. The root cause was a resource right-sizing change deployed to the authorization service the previous day. It reduced CPU allocation below what was required at peak, causing the service's network gateway to throttle under load. Because the change was deployed after peak traffic on March 12, the reduced capacity wasn't surfaced until the next day's peak. The incident was mitigated by manually scaling up the authorization service and reverting the configuration change. To prevent recurrence, we are adding further resource utilization monitors across our entire stack to detect throttling and improving error handling so transient infrastructure timeouts are distinguished from authorization failures, enabling quicker detection of the root issue.
We have deployed mitigations and are actively monitoring for recovery. We'll post another update by 17:00 UTC.
We are investigating intermittent performance degradation affecting Actions, Feeds, Issues, Package Registry, Profiles, Registry Metadata, Star, and User Dashboard. Users may experience elevated error rates and slower response times when accessing these services. We have identified a potential cause and are implementing mitigations to restore normal service. We'll post another update by 16:15 UTC.
Packages is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of issues with service(s): Actions, Feeds, Issues, Profiles, Registry Metadata, Star, User Dashboard. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of degraded performance for Actions and Issues
Degraded Codespaces experience
9 updates
On March 12, 2026, between 01:00 UTC and 18:53 UTC, users saw failures downloading extensions within created or resumed codespaces. Users would see an error when attempting to use an extension within VS Code. Active codespaces with extensions already downloaded were not impacted.The extensions download failures were the result of a change introduced in our extension dependency and was resolved by updating the configuration of how those changes affect requests from Codespaces. We are enhancing observability and alerting of critical issues within regular codespace operations to better detect and mitigate similar issues in the future.
Codespaces IPs are no longer being blocked from Visual Studio Marketplace operations and we are monitoring for full recovery
We're seeing intermittent failures downloading from the extension marketplace from codespaces, caused by IP blocks for some codespaces. We're working to remove those blocks.
We're seeing intermittent failures downloading from the extension marketplace from codespaces and are investigating.
We're seeing partial recovery for the issue affecting extension installation in newly created Codespaces. Some users may still experience degraded functionality where extensions hit errors. The team continues to investigate the root cause while monitoring the recovery.
We have deployed a fix for the issue affecting extension installation in newly created Codespaces. New Codespaces are now being created with working extensions. We'll post another update by 15:30 UTC.
We are continuing to investigate an issue where extensions fail to install in newly created Codespaces. Users can create and access Codespaces, but extensions will not be operational, resulting in a degraded experience. The team is working on a fix. All newly created Codespaces are affected. We'll post another update by 15:00 UTC.
We're investigating an issue where extensions fail to install in newly created Codespaces. Users can still create and access Codespaces, but extensions will not be operational, resulting in a degraded development experience. Our team is actively working to identify and resolve the root cause. We'll post another update by 14:00 UTC.
We are investigating reports of degraded performance for Codespaces
Actions failures to download (401 Unauthorized)
4 updates
On March 12, 2026 between 02:30 and 06:02 UTC some GitHub Apps were unable to mint server to server tokens, resulting in 401 Unauthorized errors. During the outage window, ~1.3% of requests resulted in 401 errors incorrectly. This manifested in GitHub Actions jobs failing to download tarballs, as well as failing to mint fine-grained tokens. During this period, approximately 5% of Actions jobs were impacted The root cause was a failure with the authentication service’s token cache layer, a newly created secondary cache layer backed by Redis – caused by Kubernetes control plane instability, leading to an inability to read certain tokens which resulted in 401 errors. The mitigation was to fallback reads to the primary cache layer backed by mysql. As permanent mitigations, we have made changes to how we deploy redis to not rely on the Kubernetes control plane and maintain service availability during similar failure modes. We also improved alerting to reduce overall impact time from similar failures.
Actions is operating normally.
We are continuing investigation of reports of degraded performance for Actions and GitHub Apps
We are investigating reports of degraded performance for Actions
Disruption with some GitHub services
4 updates
Between 01:36 and 08:11 UTC on Thursday March 12, GitHub.com experienced elevated error rates across Git operations, web requests, and related services. During a planned infrastructure upgrade, a configuration issue caused newly provisioned Kubernetes nodes to run an incompatible version of etcd, which disrupted cluster consensus across several production clusters. This led to intermittent 5XX errors on git push, git clone, and page loads. Deployments were paused for the duration of the incident.Once the incompatible nodes were identified, they were removed and cluster consensus was restored. A validation deploy confirmed all systems were healthy before normal operations resumed.To prevent recurrence, we are adding programmatic enforcement of version compatibility during node replacements, implementing monitoring to detect split-brain conditions earlier, and updating our recovery tooling to reduce restoration time.
We've identified the root cause and are working on resolving the underlying issue. Some users may have encountered intermittent failures and errors. We're continuing to see reduced error rates.
We are investigating elevated error rates. Error rates are now decreasing and we're continuing to monitor the situation.
We are investigating reports of impacted performance for some GitHub services.
Degraded experience with Copilot Code Review
5 updates
On March 11, 2026, between 13:00 UTC and 15:23 UTC the Copilot Code Review service was degraded and experienced longer than average review times. On average, Copilot Code Review requests took 4 minutes and peaked at just under 8 minutes. This was due to hitting worker capacity limits and CPU throttling. We mitigated the incident by increasing partitions, and we are improving our resource monitoring to identify potential issues sooner.
Copilot Code Review queue processing has returned to normal levels.
We experienced degraded performance with Copilot Code Review starting at 14:01 UTC. Customers experienced extended review times and occasional failures. Some extended processing times may continue briefly. We are monitoring for full recovery. We'll post another update by 16:30 UTC.
We are investigating degraded performance with Copilot Code Review. Customers may experience extended review times or occasional failures. We are seeing signs of improvement as our team works to restore normal service. We'll post another update by 15:30 UTC.
We are investigating reports of impacted performance for some GitHub services.
Incident with API Requests
3 updates
On March 11, 2026, between 14:25 UTC and 14:34 UTC, the REST API platform was degraded, resulting in increased error rates and request timeouts. REST API 5xx error rates peaked at ~5% during the incident window with two distinct spikes: the first impacting REST services broadly, and the second driven by sustained timeouts on a subset of endpoints. The incident was caused by a performance degradation in our data layer, which resulted in increased query latency across dependent services. Most services recovered quickly after the initial spike, but resource contention caused sustained 5xx errors due to how certain endpoints responded to the degraded state. A fix addressing the behavior that prolonged impact has already been shipped. We are continuing to work to resolve the primary contributing factor of the degradation and to implement safeguards against issues causing cascading impact in the future.
We are investigating elevated timeouts that affected GitHub API requests. The incident began at 14:37 UTC. Some users experienced slower response times and request failures. System metrics have returned to normal levels, and we are now investigating the root cause to prevent recurrence.
We are investigating reports of degraded performance for API Requests
Incident With Webhooks
1 update
On March 10, 2026, between 23:00 UTC and 23:40 UTC, the Webhooks service was degraded and ~6% of users experienced intermittent errors when accessing webhook delivery history, retrying webhook deliveries, and listing webhooks via the UI and API. Approximately 0.37% of requests resulted in errors, while at peak 0.5% of requests resulted in errors. This was due to unhealthy infrastructure. We mitigated the incident by redeploying affected services, after which service health returned to normal. We are working to improve detection of unhealthy infrastructure and strengthen service safeguards to reduce time to detect and mitigate similar issues in the future.
Incident with Webhooks
4 updates
On March 9, 2026, between 15:03 and 20:52 UTC, the Webhooks API experienced was degraded, resulted in higher average latency on requests and in certain cases error responses. Approximately 0.6% of total requests exceeded the normal latency threshold of 3s, while 0.4% of requests resulted in 500 errors. At peak, 2.0% experienced latency greater than 3 seconds and 2.8% of requests returned 500 errors.The issue was caused by a noisy actor that led to resource contention on the Webhooks API service. We mitigated the issue initially by increasing CPU resources for the Webhooks API service, and ultimately applied lower rate limiting thresholds to the noisy actor to prevent further impact to other users.We are working to improve monitoring to more quickly ascertain noisy traffic and will continue to improve our rate-limiting mechanisms to help prevent similar issues in the future.
Webhooks is operating normally.
We are experiencing latency on the API and UI endpoints. We are working to resolve the issue.
We are investigating reports of degraded performance for Webhooks
Incident with Codespaces
5 updates
On March 9, 2026, between 01:23 UTC and 03:25 UTC, users attempting to create or resume codespaces in the Australia East region experienced elevated failures, peaking at a 100% failure rate for this region. Codespaces in other regions were not affected.The create and resume failures were caused by degraded network connectivity between our control plane services and the VMs hosting the codespaces. This was resolved by redirecting traffic to an alternate site within the region. While we are addressing the core network infrastructure issue, we have also improved our observability of components in this area to improve detection. This will also enable our existing automated failovers to cover this failure mode. These changes will prevent or significantly reduce the time any similar incident causes user impact.
This incident has been resolved. New Codespace creation requests are now completing successfully.
We are seeing recovery, with the failure rate for new Codespace creation requests dropping from 5% to about 3%.
We are seeing about 5% of new Codespace creation requests failing. We are investigating the root cause and identifying the impacted regions.
We are investigating reports of degraded performance for Codespaces
Incident with Webhooks
14 updates
On March 6, 2026, between 16:16 UTC and 23:28 UTC the Webhooks service was degraded and some users experienced intermittent errors when accessing webhook delivery histories, retrying webhook deliveries, and listing webhooks via the UI and API. On average, the error rate was 0.57% and peaked at approximately 2.73% of requests to the service. This was due to unhealthy infrastructure affecting a portion of webhook API traffic.We mitigated the incident by redeploying affected services, after which service health returned to normal.We are working to improve detection of unhealthy infrastructure and strengthen service safeguards to reduce time to detection and mitigation of issues like this one in the future.
Webhooks is operating normally.
We have deployed a fix and are observing a full recovery. The affected endpoint was the webhook deliveries API (https://docs.github.com/en/rest/repos/webhooks?apiVersion=2022-11-28#list-deliveries-for-a-repository-webhook) and its organization and integration variants. We will continue monitoring to confirm stability.
We are preparing a new mitigation for the issue affecting the webhook deliveries API (https://docs.github.com/en/rest/repos/webhooks?apiVersion=2022-11-28#list-deliveries-for-a-repository-webhook) and its organization and integration variants. Overall impact remains low, with under 1% of requests failing for a subset of customers.
The previous mitigation did not resolve the issue. We are investigating further. The affected endpoint is the webhook deliveries API (https://docs.github.com/en/rest/repos/webhooks?apiVersion=2022-11-28#list-deliveries-for-a-repository-webhook) and its organization and integration variants. Overall impact remains low, with under 1% of requests failing for a subset of customers.
We have deployed a fix for the issue causing some users to experience intermittent failures when accessing the Webhooks API and configuration pages. We are monitoring to confirm full recovery.
We continue working on mitigations to restore service.
We continue working on mitigations to restore service.
We continue working on mitigations to restore service.
We continue working on mitigations to restore full service.
Our engineers have identified the root cause and are actively implementing mitigations to restore full service.
This problem is impacting less than 1% of UI and webhook API calls.
We are investigating an issue affecting a subset of customers experiencing errors when viewing webhook delivery histories and retrying webhook deliveries. UI and webhook API is impacted. Engineers have identified the cause and are actively working on mitigation.
We are investigating reports of degraded performance for Webhooks
Actions is experiencing degraded availability
7 updates
On March 5, between 22:39 and 23:55 UTC, Actions was degraded due to a repeat of an incident a few hours prior. In this case, a Redis cluster topology change made as a follow-up to the earlier incident caused a repeat of the earlier degradation of Actions jobs. Details of both incidents and the follow-ups are shared at https://www.githubstatus.com/incidents/g5gnt5l5hf56.
We are close to full recovery. Actions and dependent services should be functioning normally now.
Actions is experiencing degraded performance. We are continuing to investigate.
Actions and dependent services, including Pages, are recovering.
We applied a mitigation and we should see a recovery soon.
Actions is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of degraded performance for Actions
Multiple services are affected, service degradation
11 updates
On Mar 5, 2026, between 16:24 UTC and 19:30 UTC, Actions was degraded. During this time, 95% of workflow runs failed to start within 5 minutes with an average delay of 30 minutes and 10% workflow runs failed with an infrastructure error. This was due to Redis infrastructure updates that were being rolled out to production to improve our resiliency. These changes introduced a set of incorrect configuration change into our Redis load balancer causing internal traffic to be routed to an incorrect host leading to two incidents. We mitigated this incident by correcting the misconfigured load balancer. Actions jobs were running successfully starting at 17:24 UTC. The remaining time until we closed the incident was burning through the queue of jobs. We immediately rolled back the updates that were a contributing factor and have frozen all changes in this area until we have completed follow-up work from this. We are working to improve our automation to ensure incorrect configuration changes are not able to propagate through our infrastructure. We are also working on improved alerting to catch misconfigured load balancers before it becomes an incident. Additionally, we are updating the Redis client configuration in Actions to improve resiliency to brief cache interruptions.
Webhooks is operating normally.
Actions is operating normally.
Actions is now fully recovered.
The queue of requested Actions jobs continues to make progress. Job delays are now approximately 6 minutes and continuing to decrease.
We are back to queueing Actions workflow runs at nominal rates and we are monitoring the clearing of queued runs during the incident.
We have applied mitigations for connection failures across backend resources and we are observing a recovery in queueing Actions workflow runs.
We are observing delays in queuing Actions workflow runs. We’re still investigating the causes of these delays.
Webhooks is experiencing degraded availability. We are continuing to investigate.
Actions is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of degraded performance for Actions
Disruption with some GitHub services
4 updates
On March 5, 2026, between 12:53 UTC and 13:35 UTC, the Copilot mission control service was degraded. This resulted in empty responses returned for users' agent session lists across GitHub web surfaces. Impacted users were unable to see their lists of current and previous agent sessions in GitHub web surfaces. This was caused by an incorrect database query that falsely excluded records that have an absent field.We mitigated the incident by rolling back the database query change. There were no data alterations nor deletions during the incident.To prevent similar issues in the future, we're improving our monitoring depth to more easily detect degradation before changes are fully rolled out.
Copilot coding agent mission control is fully restored. Tasks are now listed as expected.
Users were temporarily unable to see tasks listed in mission control surfaces. The ability to submit new tasks, view existing tasks via direct link, or manage tasks was unaffected throughout. A revert is currently being deployed and we are seeing recovery.
We are investigating reports of impacted performance for some GitHub services.
Some OpenAI models degraded in Copilot
4 updates
On March 5th, 2026, between approximately 00:26 and 00:44 UTC, the Copilot service experienced a degradation of the GPT 3.5 Codex model due to an issue with our upstream provider. Users encountered elevated error rates when using GPT 3.5 Codex, impacting approximately 30% of requests. No other models were impacted.The issue was resolved by a mitigation put in place by our provider.
The issues with our upstream model provider have been resolved, and gpt-5.3-codex is once again available in Copilot Chat and across IDE integrations. We will continue monitoring to ensure stability, but mitigation is complete.
We are experiencing degraded availability for the gpt-5.3-codex model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.
We are investigating reports of degraded performance for Copilot
Claude Opus 4.6 Fast not appearing for some Copilot users
3 updates
On March 3, 2026, between 19:44 UTC and 21:05 UTC, some GitHub Copilot users reported that the Claude Opus 4.6 Fast model was no longer available in their IDE model selection. After investigation, we confirmed that this was caused by enterprise administrators adjusting their organization's model policies, which correctly removed the model for users in those organizations. No users outside the affected organizations lost access.We confirmed that the Copilot settings were functioning as designed, and all expected users retained access to the model. The incident was resolved once we verified that the change was intentional and no platform regression had occurred.
We believe that all expected users still have access to Claude Opus 4.6. We confirm that no users have lost access.
We are investigating reports of degraded performance for Copilot
Incident with all GitHub services
25 updates
On March 3, 2026, between 18:46 UTC and 20:09 UTC, GitHub experienced a period of degraded availability impacting GitHub.com, the GitHub API, GitHub Actions, Git operations, GitHub Copilot, and other dependent services. At the peak of the incident, GitHub.com request failures reached approximately 40%. During the same period, approximately 43% of GitHub API requests failed. Git operations over HTTP had an error rate of approximately 6%, while SSH was not impacted. GitHub Copilot requests had an error rate of approximately 21%. GitHub Actions experienced less than 1% impact. This incident shared the same underlying cause as an incident in early February where we saw a large volume of writes to the user settings caching mechanism. While deploying a change to reduce the burden of these writes, a bug caused every user’s cache to expire, get recalculated, and get rewritten. The increased load caused replication delays that cascaded down to all affected services. We mitigated this issue by immediately rolling back the faulty deployment. We understand these incidents disrupted the workflows of developers. While we have made substantial, long-term investments in how GitHub is built and operated to improve resilience, we acknowledge we have more work to do. Getting there requires deep architectural work that is already underway, as well as urgent, targeted improvements. We are taking the following immediate steps: - We have added a killswitch and improved monitoring to the caching mechanism to ensure we are notified before there is user impact and can respond swiftly. - We are moving the cache mechanism to a dedicated host, ensuring that any future issues will solely affect services that rely on it.
We're seeing recovery across all services. We're continuing to monitor for full recovery.
Actions is operating normally.
Git Operations is operating normally.
Git Operations is experiencing degraded availability. We are continuing to investigate.
We are seeing recovery across multiple services. Impact is mostly isolated to git operations at this point, we continue to investigate
Copilot is operating normally.
Pull Requests is operating normally.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Issues is operating normally.
Webhooks is operating normally.
Codespaces is operating normally.
Webhooks is experiencing degraded performance. We are continuing to investigate.
Issues is experiencing degraded performance. We are continuing to investigate.
We've identified the issue and have applied a mitigation. We're seeing recovery of services. We continue to montitor for full recovery.
API Requests is operating normally.
API Requests is experiencing degraded performance. We are continuing to investigate.
Codespaces is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded availability. We are continuing to investigate.
Webhooks is experiencing degraded availability. We are continuing to investigate.
We're seeing some service degradation across GitHub services. We're currently investigating impact.
Webhooks is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
API Requests is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of degraded availability for Actions, Copilot and Issues
Delayed visibility of newly added issues on project boards
13 updates
Between March 2, 21:42 UTC and March 3, 05:54 UTC project board updates, including adding new issues, PRs, and draft items to boards, were delayed from 30 minutes to over 2 hours, as a large backlog of messages accumulated in the Projects data denormalization pipeline.The incident was caused by an anomalously large event that required longer processing time than expected. Processing this message exceeded the Kafka consumer heartbeat timeout, triggering repeated consumer group rebalances. As a result, the consumer group was unable to make forward progress, creating head-of-line blocking that delayed processing of subsequent project board updates.We mitigated the issue by deploying a targeted fix that safely bypassed the offending message and allowed normal message consumption to resume. Consumer group stability recovered at 04:10 UTC, after which the backlog began draining. All queued messages were fully processed by 05:53 UTC, returning project board updates to normal processing latency.We have identified several follow-up improvements to reduce the likelihood and impact of similar incidents in the future, including improved monitoring and alerting, as well as introducing limits for unusually large project events.
This incident has been resolved. Project board updates are now processing in near-real-time.
The backlog of delayed updates is expected to fully clear within approximately 1 hour, after which project board updates will return to near-real-time.
The fix has been deployed and processing speeds have returned to normal. There is a backlog of delayed updates that will continue to be worked through — we're estimating how long that will take and will provide an update in the next 60 minutes.
The fix is still building and is expected to deploy within 60 minutes. The current delay for GitHub Projects updates has increased to up to 5 hours.
We're deploying a fix targeting the increased delay in GitHub Projects updates. The rollout should complete within 60 minutes. If successful, the current delay of up to 4 hours should begin to decrease.
The delay for project board updates has increased to up to 3 hours. We've identified a potential cause and are working on remediation.
Project board updates — including adding issues, pull requests, and changing fields such as "Status" — are currently delayed by 1–2 hours. Normal behavior is near-real-time. We're actively investigating the root cause.
The impact extends beyond adding issues to project boards. Adding pull requests and updating fields such as "Status" may also be affected. We're continuing to investigate the root cause.
Newly added issues are taking 30–60 minutes to appear on project boards, compared to the normal near-real-time behavior. We're investigating the root cause and possible mitigations.
Newly added issues can take up to 30 minutes to appear on project boards. We're investigating the cause of this delay.
Issues is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Incident with Pull Requests /pulls
6 updates
On March 2nd, 2026, between 7:10 UTC and 22:04 UTC the pull requests service was degraded. Users navigating between tabs on the pull requests dashboard were met with 404 errors or blank pages.This was due to a configuration change deployed on February 27th at 11:03 PM UTC. We mitigated the incident by reverting the change.We’re working to improve monitoring for the page to automatically detect and alert us to routing failures.
The issue on https://github.com/pulls is now fully resolved. All tabs are working again.
We're deploying a fix for pull request filtering. Full rollout across all regions is expected within 60 minutes.
We are experiencing issues with the Pull Requests dashboard that prevent users from filtering their pull requests. We have identified a mitigation and are deploying a fix. We'll post another update by 21:00 UTC.
We are seeing a degraded experience when attempting to filter the /pulls dashboard. We are working on a mitigation.
We are investigating reports of degraded performance for Pull Requests
February 2026(9 incidents)
Incident with Copilot agent sessions
5 updates
On February 27, 2026, between 22:53 UTC and 23:46 UTC, the Copilot coding agent service experienced elevated errors and degraded functionality for agent sessions. Approximately 87% of attempts to start or interact with agent sessions encountered errors during this period.This was due to an expired authentication credential for an internal service component, which prevented Copilot agent session operations from completing successfully.We mitigated the incident by rotating the expired credential and deploying the updated configuration to production. Services began recovering within minutes of the fix being deployed.We are working to improve automated credential rotation coverage across all Copilot service components, add proactive alerting for credentials approaching expiration, and validate configuration consistency to reduce our time to detection and mitigation of issues like this one in the future.
We have identified the cause of the elevated errors and are rolling out a fix to production. We are observing initial recovery in Copilot agent sessions.
We are investigating networking issues with some requests to our models.
We are investigating a spike in errors in Copilot agent sessions
We are investigating reports of degraded performance for Copilot
Code view fails to load when content contains some non-ASCII characters
6 updates
Starting February 26, 2026 at 22:10 UTC through February 27, 05:50 UTC, the repository browsing UI was degraded and users were unable to load pages for files and directories with non-ASCII characters (including Japanese, Chinese, and other non-Latin scripts). On average, the error rate was 0.014% and peaked at 0.06% of requests to the service. Affected users saw 404 errors when navigating to repository directories and files with non-ASCII names. This was due to a code change that altered how file and directory names were processed, which caused incorrectly formatted data to be stored in an application cache.We mitigated the incident by deploying a fix that invalidated the affected cache entries and progressively rolling it out across all production environments.We are working to improve our pre-production testing to cover non-ASCII character handling, establish better cache invalidation mechanisms, and enhance our monitoring to detect this type of failure mode earlier, to reduce our time to detection and mitigation of issues like this one in the future.
We have cleared all caches and everything is operating normally.
We have mitigated the issue but are working on invalidating caches in order to fix the issue for all impacted repos.
We have performed a mitigation but some repositories may still see issues. We are working on a full mitigation.
We are looking into recent code changes to mitigate the error loading some code view pages.
We are investigating reports of impacted performance for some GitHub services.
High latency on webhook API requests
3 updates
Between February 26, 2026 UTC and February 27, 2026 UTC, customers hitting the webhooks delivery API may have experienced higher latency or failed requests. During the impact window, 0.82% of requests took longer than 3s and 0.004% resulted in a 500 error response.Our monitors caught the impact on the individual backing data source, and we were able to attribute the degradation to a noisy neighbor effect due requests to a specific webhook generating excessive load on the API. The incident was mitigated once traffic from the specific hook decreased.We have since added a rate limiter for this webhooks API to prevent similar spikes in usage impacting others and will further refine the rate limits for other webhook API routes to help prevent similar occurrences in the future.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Incident with Copilot
3 updates
On February 26, 2026, between 09:27 UTC and 10:36 UTC, the GitHub Copilot service was degraded and users experienced errors when using Copilot features including Copilot Chat, Copilot Coding Agent and Copilot Code Review. During this time, 5-15% of affected requests to the service returned errors.The incident was resolved by infrastructure rebalancing.We are improving observability to detect capacity imbalances earlier and enhancing our infrastructure to better handle traffic spikes.
Copilot is operating normally.
We are investigating reports of degraded performance for Copilot
Incident with Copilot Agent Sessions impacting CCA/CCR
2 updates
On February 25, 2026, between 15:05 UTC and 16:34 UTC, the Copilot coding agent service was degraded, resulting in errors for 5% of all requests and impacting users starting or interacting with agent sessions. This was due to an internal service dependency running out of allocated resources (memory and CPU). We mitigated the incident by adjusting the resource allocation for the affected service, which restored normal operations for the coding agent service.We are working to implement proactive monitoring for resource exhaustion across our services, review and update resource allocations, and improve our alerting capabilities to reduce our time to detection and mitigation of similar issues in the future.
We are investigating reports of degraded performance for Copilot
Code search experiencing degraded performance
7 updates
Between 2026-02-23 19:10 and 2026-02-24 00:46 UTC, all lexical code search queries in GitHub.com and the code search API were significantly slowed, and during this incident, between 5 and 10% of search queries timed out. This was caused by a single customer who had created a network of hundreds of orchestrated accounts which searched with a uniquely expensive search query. This search query concentrated load on a single hot shard within the search index, slowing down all queries. After we identified the source of the load and stopped the traffic, latency returned to normal.To avoid this situation occurring again in the future, we are making a number of improvements to our systems, including: improved rate limiting that accounts for highly skewed load on hot shards, improved system resilience for when a small number of shards time out, improved tooling to recognize abusive actors, and capabilities that will allow us to shed load on a single shard in emergencies.
We have identified a cause for the latency and timeouts and have implemented a fix. We are observing initial recovery now.
Customers using code search continue to see increased latency and timeout errors. We are working to mitigate issues on the affected shard.
Elevated latency and timeouts for code search is isolated to a single shard experiencing elevated CPU. We are taking steps to isolate and mitigate the affected shard.
Elevated latency and timeouts for code search is isolated to a single shard experiencing elevated CPU. We are continuing to investigate the cause and steps to mitigate.
We are continuing to investigate elevated latency and timeouts for code search.
We are investigating reports of impacted performance for some GitHub services.
Incident with Issues and Pull Requests Search
3 updates
On February 23, 2026, between 21:01 UTC and 21:30 UTC the Search service experienced degraded performance, resulting in an average of 3.5% of search requests for Issues and Pull Requests being rejected. During this period, updates to Issues and Pull Requests may not have been immediately reflected in search results. During a routine migration, we observed a spike in internal traffic due to a configuration change in our search index. We were alerted to the increase in traffic as well as the increase in error rates and rolled back to the previous stable index. We are working to enable more controlled traffic shifting when promoting a new index to allow us to detect potential limitations earlier and ensure these operations succeed in a more controlled manner.
Some customers are seeing timeout errors when searching for issues or pull requests. Team is currently investigating a fix.
We are investigating reports of degraded performance for Issues and Pull Requests
Incident with Actions
2 updates
On February 23, 2026, between 15:00 UTC and 17:00 UTC, GitHub Actions experienced degraded performance. During the time, 1.8% of Actions workflow runs experienced delayed starts with an average delay of 15 minutes. The issue was caused by a connection rebalancing event in our internal load balancing layer, which temporarily created uneven traffic distribution across sites and led to request throttling. To prevent recurrence, we are tuning connection rebalancing behavior to spread client reconnections more gradually during load balancer reloads. We are also evaluating improvements to site-level traffic affinity to eliminate the uneven distribution at its source. We have overprovisioned critical paths to prevent any impact if a similar event occurs before those workstreams finish. Finally, we are enhancing our monitoring to detect capacity imbalances proactively.
We are investigating reports of degraded performance for Actions
Incident with Copilot
6 updates
On February 23, 2026, between 14:45 UTC and 16:19 UTC, the Copilot service was degraded for Claude Haiku 4.5 model. On average, 6% of the requests to this model failed due to an issue with an upstream provider. During this period, automated model degradation notifications directed affected users to alternative models. No other models were impacted. The upstream provider identified and resolved the issue on their end. We are working to improve automatic model failover mechanisms to reduce our time to mitigation of issues like this one in the future.
Copilot is operating normally.
The issues with our upstream model provider have been resolved, and Haiku 4.5 is once again available in Copilot Chat and across IDE integrations.We will continue monitoring to ensure stability, but mitigation is complete.
Our provider has recovered and we are not seeing errors but we are awaiting a signal from them that the issue will not regress before we go green.
We are experiencing degraded availability for the Haiku 4.5 model in Copilot Chat, VS Code and other Copilot products. This is due to an issue with an upstream model provider. We are working with them to resolve the issue.Other models are available and working as expected.
We are investigating reports of degraded performance for Copilot
📡 Tired of checking GitHub status manually?
Better Stack monitors uptime every 30 seconds and alerts you instantly when GitHub goes down.