GitHub Outage History
Past incidents and downtime events
Complete history of GitHub outages, incidents, and service disruptions. Showing 50 most recent incidents.
May 2026(16 incidents)
Disruption with some GitHub services
3 updates
The degradation affecting Copilot has been mitigated. We are monitoring to ensure stability.
Copilot is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Incident with Actions and Pages
8 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Actions and Pages has been mitigated. We are monitoring to ensure stability.
We have identified the cause of the authentication issues affecting GitHub Actions and are actively working on mitigation
Actions is experiencing degraded performance. We are continuing to investigate.
We are investigating authentication issues leading to failure in starting Actions runs and downloading actions. At this time the majority of Actions runs is impacted.
Actions is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of degraded performance for Actions and Pages
Intermittent errors with app installation token authentication
8 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
This is fully mitigated, we will continue to monitor to ensure it does not reoccur.
We have identified and are applying additional mitigation and will continue to monitor for complete mitigation.
We see significant signs of mitigation and are monitoring for full mitigation.
We are seeing signs of mitigation and are continuing to monitor for complete mitigation. Next update in one hour.
We are continuing to investigate an elevated error rate of authentication failures for app installation tokens. Next update in one hour.
We are seeing an increased rate of authentication failures for app installation tokens, affecting approximately 1% of tokens. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Incident with Actions
6 updates
On May 20, 2026, between 16:00 UTC and 17:45 UTC, GitHub Actions customers experienced run start delays exceeding 5 minutes. Approximately 4.5% of all runs were delayed during the impact window, with scale set jobs disproportionately affected. 30% of scale set jobs were delayed and 4% failed to start entirely. The incident was caused by a misconfigured health check on an internal service that assigns jobs to runners. A brief latency spike in an upstream dependency triggered health check failures across several pods, removing them from service and concentrating load on the remaining capacity. The added load drove memory pressure that escalated into a cascading failure in one regional cluster, leaving it unable to self-recover. Responders mitigated the incident by scaling capacity in the healthy regional clusters and draining traffic away from the impaired one, after which run start latency recovered. To prevent recurrence, we are strengthening our health check configuration to avoid cascading failure scenarios and evaluating automated mitigations to rebalance traffic when a region is degraded.
Customer impact has fully subsided. We are maintaining yellow status while we deploy a permanent fix to prevent recurrence.
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
A subset of runners are taking longer than expected to connect, which may delay some jobs from beginning execution. We are actively working to mitigate the issue.
We are investigating reports of degraded performance for Actions
Actions is experiencing degraded availability
7 updates
On May 15, 2026, from approximately 07:43 UTC to 08:48 UTC, GitHub Actions experienced a degradation that caused workflow runs to fail or experience delayed starts for a subset of customers. The incident was triggered by a planned failover of supporting infrastructure used by GitHub Actions. During that operation, an automated service discovery update did not propagate correctly, which caused traffic to be routed incorrectly and increased request timeouts in a core dependency for workflow orchestration. At peak impact, 42% of Actions runs failed. Downstream services that depend on Actions workflow execution were also impacted, including GitHub Pages and Copilot cloud services. At 08:12 UTC, responders manually corrected the service discovery routing issue. Timeout and failure rates recovered shortly after, and we continued monitoring until full stabilization was confirmed across all affected services. The incident was marked resolved at 08:48 UTC. To prevent recurrence, we are implementing failover guardrails that validate service discovery state before completing failover operations, strengthening pre-flight and post-flight verification checks, and improving dependency resilience to reduce timeout cascades during infrastructure events.
The degradation has been mitigated. We are monitoring to ensure stability.
We are monitoring an issue that was affecting GitHub Actions and causing downstream issues in GitHub Coding Agent and GitHub Code Review Agent. The issue has resolved now but we are closely monitoring our systems for full recovery.
The degradation affecting Pages has been mitigated. We are monitoring to ensure stability.
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
Pages is experiencing degraded availability. We are continuing to investigate.
We are investigating reports of degraded availability for Actions
[Retroactive] Incident with GitHub.com
1 update
Beginning at 02:49 UTC on May 15 2026 and lasting until 03:04 UTC, GitHub.com was unavailable for a subset of customers. This impact has been mitigated and normal service resumed. The issue was rooted in a sudden spike in traffic, with intermittent impact. We've identified the source of the traffic and prevented further disruption.
Incident with CodeQL
6 updates
On May 13, 2026, between 14:31 and 16:03 UTC, the Code Scanning service experienced processing delays and 12% of check runs took over 15 minutes to complete. The delays were caused by replication lag due to an internal database migration, resulting in insufficient worker capacity for our high rate of job enqueues. We mitigated the impact by scaling our processing workers by 34%. Code Scanning results returned to normal processing times after the mitigation was applied. The capacity increases are permanent, and we are looking into more ways to decrease the load on our workers to help prevent this in the future.
CodeQL impact has been mitigated. We are continuing to monitor for durable recovery.
The degradation has been mitigated. We are monitoring to ensure stability.
We have applied a mitigation to increase processing capacity. We are continuing to monitor to confirm full recovery. We will provide another update by 15:30 UTC.
We are investigating delays affecting CodeQL, the code analysis engine used by Code Scanning. Some users may experience delayed or incomplete code scanning results. Our engineering team is investigating. We will provide another update by 15:15 UTC.
We are investigating reports of impacted performance for some GitHub services.
Incident with CodeQL, Webhooks, Notifications, and Slack Integration
10 updates
On May 12, 2026, between 13:41 and 17:43 UTC, some services experienced delays in processing. For the Code Scanning service, 53% of check runs took over 15 minutes to complete. Additionally, notifications took an average of 22 minutes to be delivered and Slack integration webhooks took an average of 20 minutes to be delivered. The delays were caused by replication lag due to an internal database migration, resulting in insufficient worker capacity for our high rate of job enqueues. We mitigated the impact by scaling our processing workers to handle the increased load. All services returned to normal processing times after the mitigation was applied. We are working to create dedicated worker pools for some of our high usage shared queues to help prevent this in the future.
All services have fully recovered.
CodeQL has fully recovered. We're continuing to work on recovery for the remaining impacted services.
Webhooks have fully recovered. Continuing to work on recovery for the other services.
Webhooks is operating normally.
We've established that most delays are related to a queuing service and are working to scale out. Early signals from the scale-out are showing signs of recovery for some services. We'll provide an update when services are fully recovered.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We're continuing to investigate issues with CodeQL actions workflows. We're additionally seeing delays for notifications, webhooks, and the Slack integration.
CodeQL actions are currently experiencing delays, which may result in those actions being stuck in a pending state or having failed due to a timeout.
We are investigating reports of degraded performance for CodeQL
Incident with high errors on Git Operations
2 updates
On May 11th, 2026, between 14:00 UTC and 14:33 UTC, HTTP-based Git read operations were degraded. On average, the error rate was 2.8% and peaked at 7.5% of requests to the service. This was due to resource exhaustion in a networking gateway between GitHub.com’s frontend service for Git operations and a dependency service that performs authentication and authorization. Following the initial spike, the frontend service became stuck in a degraded state in one of our data centers, increasing time to mitigation. We mitigated the incident by scaling the networking gateway and re-deploying the frontend service. To reduce our time to detection and mitigation in the future, we are adding auto-scaling to the networking gateway, and resolving a bug which caused the frontend service to remain degraded.
We are investigating reports of degraded performance for Git Operations
CCR and CCA failing to start for PR comments
4 updates
On May 7, 2026, between 04:12 UTC and 06:13 UTC, Copilot Cloud Agent and Copilot Code Review Agent sessions for pull requests were delayed or failed to start.The issue was caused by follow-up recovery work from a separate Pull Requests incident (https://www.githubstatus.com/incidents/f5pb5d5mr9yh). As part of that recovery, we ran a large database migration, which caused replication delays on several replica hosts.Although those replicas were not serving user traffic, our safeguards correctly treated the elevated replication lag as a signal to slow down writes to the affected database cluster. As a result, some pull request background processing was temporarily delayed. That processing is responsible for sending the internal events that Copilot agents use to begin work, so affected agents did not start until the database replicas caught up.The system recovered once replication lag returned to normal and pull request processing resumed. We are reviewing how this safeguard interacts with recovery migrations so we can reduce the chance of similar secondary impact during future incident recovery work.
Copilot code review and cloud agents are starting again for pull requests, we are monitoring for full recovery.
The degradation has been mitigated. We are monitoring to ensure stability.
We are investigating reports of impacted performance for some GitHub services.
Incident with Pull Requests
8 updates
On May 6, 2026 between 15:12 and 19:02 UTC creation of new pull request review threads on GitHub.com failed. This included new line comments and file comments on pull requests. Existing PRs and previously created comments were unaffected. This incident was caused by a 32-bit integer key reaching its maximum value in a Vitess lookup table used during PR thread creation. The primary table had been migrated to a 64-bit integer key but the Vitesse lookup table remained 32-bit. Once the values in the primary table passed the available 32-bit ID space in the lookup table, attempts to create new review threads began failing, resulting in near 100% failure rate for new thread creation requests. We mitigated the issue by updating the impacted lookup table definitions across all shards to use 64-bit integer column types, increasing the available ID range and restoring normal operation. Service was fully restored once the schema changes competed globally. To help prevent similar incidents, we are expanding existing monitoring of database columns to include Vitess lookup tables to notify in advance of any tables that is approaching a column size limit. This work is intended to provide earlier detection of columns approaching size limits before customer impact occurs.
Mitigations have been fully applied and we are seeing full recovery of functionality on Pull Request threads. We are continuing to monitor to ensure sustained recovery.
Creation of new Pull Request threads (including line and file comments) continues to be affected although we are seeing partial recovery.A mitigation is being applied to continue to accelerate recovery with complete recovery expected by 8:00pm UTC.Top-level comments on pull requests still function and should remain usable during recovery. Opening and merging pull requests, actions, and other pull request operations remain functional.
Creation of new Pull Request threads (including line and file comments) continues to be affected. Top-level comments on pull requests still function and should remain usable during recovery. Opening and merging pull requests, actions, and other pull request operations remain functional. A mitigation is being applied. Recovery is expected to be gradual, with complete recovery expected by 8:00pm UTC.
Pull Requests is experiencing degraded availability. We are continuing to investigate.
Creation of new Pull Request threads (including line and file comments) continues to be affected. We have identified the cause of the issue and have started taking steps to mitigate this issue.
We are investigating failures for new thread creation on Pull Requests. Responses to existing pull request threads are unaffected.
We are investigating reports of degraded performance for Pull Requests
Disruption with some GitHub services
4 updates
On May 6, 2026 between 11:02 UTC and 11:13 UTC, users were unable to start or view Copilot Cloud Agent or remote sessions. During this time, requests to the session API returned errors, preventing users from creating new sessions or viewing existing ones. The issue was caused by a configuration change to the service's network routing that inadvertently removed the ingress path for the service. The team reverted the change at 11:13 UTC which restored service. The incident remained open until 11:59 UTC while the team verified full recovery. We are taking steps to improve our deployment validation process to prevent similar configuration changes from impacting production traffic in the future.
We have applied a mitigation and Copilot services have recovered.
We are investigating issues with the ability to start Copilot Cloud Agent sessions and view them.
We are investigating reports of impacted performance for some GitHub services.
Incident with Actions, we are investigating reports of degraded availability
6 updates
On May 6, 2026, from approximately 06:45 UTC to 09:15 UTC, GitHub Actions Standard Ubuntu hosted runners were degraded. 17.1% of jobs requesting a standard runner failed.This was caused by an unexpected data shape in the allocation configuration data for standard runners. That data was introduced as part of post-incident remediation work for an incident the previous day and caused new allocations to be blocked as load ramped up for the day. Removing that data at 08:51 allowed allocations to proceed and hosted runner pools to scale up and recover.We are updating the filter logic for this allocation data to be resilient to abnormal data shapes and improving monitoring to alert when allocations are blocked, allowing the team to respond before customer impact starts.
Actions wait times have fully recovered.
The degradation affecting Actions has been mitigated. We are monitoring to ensure stability.
We've applied a mitigation to fix the issues with queuing and running Actions jobs. We are seeing improvements in telemetry and are monitoring for full recovery.
Actions is experiencing issues with ubuntu standard hosted runners leading to high wait times. We are actively investigating the issue
We are investigating reports of degraded availability for Actions
Increased Latency and Failures for SSH Git Operations
7 updates
Between approximately 14:00 and 16:10 UTC on May 5, 2026, SSH-based Git operations experienced elevated latency and intermittent failures. On average, the error rate was 0.46% and peaked at 0.6% of SSH write requests. HTTP-based Git operations, including web UI and HTTPS clones, were not affected. The impact was caused by reduced SSH capacity at one of our data center sites. During a period of high traffic, the remaining hosts became overloaded, leading to connection exhaustion and some failures for SSH-based operations. Additional capacity was provisioned to expand SSH capacity and resolve the incident. The expanded capacity was fully online by 18:18 UTC. To reduce the likelihood of similar incidents, we will implement faster scaling solutions for SSH infrastructure and improved alerting for host availability and capacity thresholds.
We've completed our mitigation to prevent further impact. At this time the incident is considered resolved.
The degradation affecting Git Operations has been mitigated. We are monitoring to ensure stability.
We're continuing to work on preventing further impact from the earlier issue. No SSH-based impact is expected at this time. We'll post new updates if impact recurs or once our mitigation is in place.
Git Operations is experiencing degraded performance. We are continuing to investigate.
Between approximately 14:00 and 16:10 UTC, customers using SSH-based Git operations may have experienced elevated latency and failures. HTTP-based operations were not impacted. We've identified a suspected root cause and are working to implement a mitigation to prevent further impact.
We are investigating reports of impacted performance for some GitHub services.
Incident with Actions
9 updates
On May 5, 2026, from approximately 13:22 UTC to 17:05 UTC, GitHub Actions hosted runners in the East US region were degraded. 13.5% of jobs requesting a standard runner failed and ~16% of requested Larger Runners with private networking pinned to East US failed or were delayed by more than 5 minutes. Copilot Code Review requests were also impacted. Approximately 8,500 code review requests timed out during this window. Affected users saw an error comment on their pull requests and were able to retry by re-requesting a review. Most runner requests were picked up by other regions automatically, but a portion of requests still routing to East US were impacted.This was triggered by a scale-up operation for hosted runner VMs in the East US region. This is a regular operation, but the VM create load hit an internal rate limit when VM creates pull images from storage. Existing backoff logic was not triggered because of the response code returned in this case. The rate limiting and VM creation failures were mitigated by reducing load to allow for recovery and allowing queued work to be processed. By 15:34 UTC, queued and failed job assignments were mostly mitigated, with less than 0.5% of runner assignments impacted between 15:34 and full recovery at 17:05.We are improving our system’s throttling behavior when limits occur, improving our controls to more quickly mitigate similar situations in the future, and reviewing all limits end-to-end for similar operations. We also immediately paused all scale and similar operations until these changes are in place and validated.
Actions is experiencing degraded performance. We are continuing to investigate.
Standard hosted runners have now reached full recovery. Hosted Runners with Private Networking in the East US region remain degraded as we continue working with our compute provider to restore capacity. Hosted Runners with private networking can fail over to a different Region to mitigate the issue.
We've seen signs of recovery for Standard Hosted Runners and are continuing to monitor for full recovery. Hosted Runners with Private Networking in the East US region remain affected as we continue working with our compute provider to restore capacity.
We've applied a mitigation for long queue times and failures on Standard Hosted Runners and are monitoring for full recovery. Hosted Runners with Private Networking in the East US region remain affected as we continue working with our compute provider to restore capacity.
We are working with our compute provider to alleviate elevated queue times and failures for Actions Jobs running on Hosted Runners in the East US region affecting 10% of runs. Hosted Runners with private networking can fail over to a different Region to mitigate the issue.
We are investigating elevated queue times and failures on Actions Jobs running on Hosted Runners in East US affecting 8% of runs. Hosted Runners with private networking can fail over to a different Azure region to mitigate the issue.
We are investigating elevated queue times on Actions Jobs running on Standard Hosted Runners in East US affecting 10% of runs
We are investigating reports of degraded availability for Actions
Incident with Issues and Webhooks
19 updates
On 2026-05-04 at 3:37:17 PM UTC we detected increased latency on issues resulting in timeouts, and elevated 500 errors on webhooks. A scheduled workload drove high utilization on the primary host of a critical datastore, saturating the connection pool. We paused the job to mitigate the problem at 4:40:05 PM UTC and have implemented measures to prevent recurrence.
The degradation has been mitigated. We are monitoring to ensure stability.
Webhooks is operating normally.
The degradation affecting Codespaces has been mitigated. We are monitoring to ensure stability.
The degradation affecting Issues has been mitigated. We are monitoring to ensure stability.
Pull Requests is operating normally.
Pages is operating normally.
Latency across services has normalized. We are continuing to investigate the root cause and prevent reoccurrence.
Actions and Packages are operating normally.
Git Operations is operating normally.
Pages is experiencing degraded performance. We are continuing to investigate.
Codespaces is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Actions is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded availability. We are continuing to investigate.
Packages is experiencing degraded performance. We are continuing to investigate.
Git Operations is experiencing degraded performance. We are continuing to investigate.
We are investigating Increased latency and timeouts across multiple GitHub services.
We are investigating reports of degraded performance for Issues and Webhooks
April 2026(27 incidents)
Incomplete pull request results in repositories
10 updates
On April 28, 2026, at approximately 14:07 UTC, GitHub received reports that pull requests were missing from search results across global and repository /pulls pages. The issue was caused by a manually invoked repair job intended for a single repository, which was executed without the required safety flags. During execution of the repair job, the database query remained correctly scoped to the repo’s PR IDs. However, the Elasticsearch reconciliation logic did not apply the same scope. It interpreted the min and max PR IDs as a continuous range, causing unrelated PR documents across other repos to be marked for deletion. This resulted in the removal of 1,789,756,838 PR documents from the search index, approximately 49% of indexed PR documents. Customer impact was limited to PR search and list discoverability. Primary storage was unaffected, and there was no impact to opening, updating, or merging PRs. The issue was identified ~10 minutes after initial customer reports. Because it affected search index completeness rather than service availability, it was not caught by existing monitoring. The root cause was a flaw in the search document repair framework: it allowed a scoped reconciliation to run without enforcing a matching Elasticsearch query scope. This created a destructive mismatch between the source-of-truth and the index. The issue was compounded by the ability to trigger the job from the production console without safety defaults. Prior testing focused only on safe backfill scenarios and did not cover this reconciliation path. Additionally, there was no automated detection for large-volume deletions in Elasticsearch. We mitigated the incident through three parallel actions: (1) Deployed a MySQL-backed search fallback for the most active repos by traffic to restore PR visibility for highly impacted users (2) Initiated a snapshot restore and reindex process to repopulate missing pull request documents in Elasticsearch (3) Added a degradation notice on PR pages to inform users of incomplete search results while recovery was in progress. The incident was resolved on May 1, 2026 at 4:15 UTC, following completion and validation of the reindex process. To prevent recurrence, we are prioritizing improvements to the repair framework and safeguards. These include enforcing scoped query alignment between primary storage and Elasticsearch, preventing destructive operations without explicit opt-in, strengthening guardrails for manual repair jobs, and evaluating restrictions on production console access. In parallel, we are expanding automated test coverage for reconciliation safety invariants and introducing detection for anomalous deletion patterns in Elasticsearch so similar issues can be identified or blocked earlier. We are committed to improving the safety and reliability of our repair systems and ensuring that operational workflows are resilient to both software defects and manual invocation risks.
This incident has been resolved. Search and indexing functionality for pull requests are now fully restored. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
We have repaired the missing search records for affected Pull Requests and are working to identify and repair records left in a stale state after the recovery.
We have restored search/indexing functionality for over 99% of impacted pull requests. We are continuing to address the remaining affected pull requests and are reviewing outstanding gaps as part of the restoration process.
Mitigation is in progress, with full recovery of impacted pull request listings expected within approximately 24 hours.
We have made an interim mitigation to improve availability for some impacted repositories while reindexing continues, and we are actively monitoring the indexing progress.
Elastic search reindexing of pull requests is continuing. All data is preserved, but may not be available on pages relying on elasticsearch until the reindex is complete.Pages and APIs that do not rely on elasticsearch, including the GitHub CLI (gh pr list) and API (/repos/{owner}/{repo}/pulls), are not impacted and can be used to retrieve pull request data in the interim.
We are actively reindexing the remaining ElasticSearch indexes. Our priority is ensuring correctness and avoiding further impact. We are taking a measured approach to safely backfill data and will share additional updates as progress continues.
After yesterday’s incident, we are investigating cases where /pulls and /repo/pulls pages are not showing all indexed pull requests. This is because our Elasticsearch cluster does not currently contain all indexed documents.No pull request data has been lost. As pull requests are updated, they will be reindexed. We are also working on accelerating a full reindex so these pages return complete results again.
We are investigating reports of degraded performance for Pull Requests
Disruption with some GitHub services
9 updates
On April 28, 2026, from approximately 12:41 UTC to 17:09 UTC, GitHub Actions jobs using Standard Ubuntu 22 and Ubuntu 24 hosted runners experienced run start delays. Approximately 8% of hosted runner jobs using Ubuntu 22 and Ubuntu 24 experienced delays greater than 5 minutes or failures. Larger and self-hosted runners were not impacted.This was caused by a performance regression introduced in the VM reimage process. That reimage delay lowered the overall capacity of runners available to pick up new jobs. This was mitigated with a rollback to a known good image version.We are addressing the core issue with reimage performance and improving the granularity of reimage telemetry across our services and our compute provider to more quickly diagnose similar issues in the future. Finally, we are evaluating other rollout changes to automatically detect similar regressions.
Actions is operating normally.
Less than 1% of hosted ubuntu-latest runs are delayed. We’re working through remaining steps to restore runner capacity.
Currently less than 2% of hosted ubuntu-latest and ubuntu-24.04 runs are delayed or failing. We are continuing to monitor for full recovery.
We've applied a mitigation to unblock running Actions. We're continuing to monitor.
We're still investigating the root cause for run start delays and failures for Actions hosted Ubuntu jobs, around 5% of jobs are impacted as of now.
Actions is experiencing degraded performance. We are continuing to investigate.
Actions is experiencing capacity constraints with hosted ubuntu-latest and ubuntu-24.02, leading to high wait times. Other hosted labels and self-hosted runners are not impacted.
We are investigating reports of impacted performance for some GitHub services.
GitHub search is degraded
15 updates
On April 27, 2026 between 16:15 UTC and 22:46 UTC, GitHub search services experienced degraded connectivity due to saturation of the load balancing tier deployed in front of our search infrastructure. This resulted in intermittent failures for services relying on our search data including Issues, Pull Requests, Projects, Repositories, Actions, Package Registry and Dependabot Alerts. The impact was varied by search target, with services seeing up to 65% of searches timing out or returning an error between 16:15 UTC and 18:00 UTC. We detected the drop in search results through our ongoing monitoring and declared an incident at 16:21 UTC when we determined the issues would not self-heal. We tracked the incident as mitigated as of 21:33 UTC and monitored the systems until 22:46 UTC when we declared the incident resolved. Our existing monitoring did not classify the increased scraping as a risk and this dimension of the incident was only discovered while working to mitigate. The saturation was caused by a large influx of anonymous distributed scraping traffic that was crafted to avoid our public API rate limits. This scraping traffic made up 30% of the day’s total search traffic, but it was concentrated within a four-hour period. The traffic originated from over 600,000 Unique IP addresses, with matching actor information across the board. To mitigate, we immediately focused on relieving pressure from the load balancers while simultaneously working on scaling the load balancing tier, blocking the anomalous traffic and applying tuning to the balancers to fully resolve the incident. Looking ahead, we’ve not only scaled the load balancer tier, but applied optimizations to improve our connection handling and re-use to reduce the possibility that a saturation event like this can re-occur. We’ve also added new monitors and controls within the platform to allow us to restrict anonymous traffic to mitigate the impact to our registered users.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Actions, Issues, Packages and Pull Requests has been mitigated. We are monitoring to ensure stability.
We've applied a mitigation and continuing to monitor
Pull Requests is experiencing degraded performance. We are continuing to investigate.
We have identified the source of the additional load causing stress on our ElasticSearch clusters. We have disabled the source of that load and are seeing signs of recovery
Pull Requests is experiencing degraded availability. We are continuing to investigate.
We're continuing to see connectivity issues reaching elasticsearch. Impact on downstream services will be intermittent as we find the root cause
Users are experiencing intermittent failures to view issues, pull requests, projects and Actions workflow runs.We are still investigating and attempting mitigations. We will provide further updates.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Packages is experiencing degraded performance. We are continuing to investigate.
Issues is experiencing degraded performance. We are continuing to investigate.
Customers across GitHub are experiencing failures with searches. Examples include: workflow run failures, projects failing to load, and timed out search requests. This is due to an ongoing infrastructure issue that we have been investigating.
We are investigating reports of degraded performance for Actions
Disruption with some GitHub services
5 updates
On April 22, 2026 from 18:49 to 19:32 UTC , the Copilot Cloud Agent service began failing during session execution for users running the Agent HQ Codex agent. Codex agent sessions failed to start for all entry points (issue assignment, @copilot comment mentions). 0.5% of total Copilot Cloud Agent jobs were impacted (~2,000 failed jobs). Copilot and other agent sessions were unaffected.This was caused by a model resolution mismatch in Codex agent sessions, resulting in an incompatible model being used at runtime. A mitigation was deployed to select a stable default model for Codex agent sessions.We are working to harden the underlying model-resolution path so it correctly scopes to the requesting agent's supported models to prevent similar failure mode in the future.
The degradation has been mitigated. We are monitoring to ensure stability.
We've found the issue and are working on deploying a solution to get Codex agent runs working again.
Copilot Cloud Agent (CCA) jobs using the Codex agent are failing after starting. To avoid this issue, please choose a different agent. We are investigating the cause and working towards remediation
We are investigating reports of impacted performance for some GitHub services.
Delays with Actions Jobs for Larger Runners using VNet Injection in the East US region
4 updates
On April 24, 2026, from approximately 11:39 UTC to April 25, 2026 at 00:15 UTC, GitHub Actions experienced delays and timeouts for Larger Hosted Runner jobs using VNet injection in the East US region without a failover region configured. Standard and Self-hosted runners were not impacted. This was caused by backend failures in our compute provider’s provisioning, scaling, and update operations for VMs in the East US region and mitigated by a rollback across all affected Availability Zones. More detail is available at https://azure.status.microsoft/en-us/status/history/?trackingId=5GP8-W0G.We are working to improve the reliability of our annotations for jobs impacted by regional issues and are adding system log notifications as an additional customer communication channel alongside annotations.VNet Failover is also now in public preview, allowing customers to evacuate Larger Hosted Runners using VNet injection in cases like this.
This is related to the public impact, "Multiservice impact for Azure Workloads in East US" shared at https://azure.status.microsoft/
We are investigating reports of degraded performance for Larger Runners with vnet injection in East US and we are working with our service provider on mitigation.
We are investigating reports of impacted performance for some GitHub services.
Incident with Pull Requests
5 updates
On April 23, 2026, between 16:05 UTC and 20:43 UTC, the Pull Requests service experienced a regression affecting merge queue operations. PRs merged via merge queue using the squash merge method produced incorrect merge commits when the merge group contained more than one PR. In affected cases, changes from previously merged PRs and prior commits were inadvertently reverted by subsequent merges.During the impact window 2,092 pull requests were affected. The issue did not affect pull requests merged outside of merge queue, nor merge queue groups using the merge or rebase methods.It took approximately 3 hours and 33 minutes to identify the issue. The change completed deployment at approximately 16:05 UTC, and we became aware at 19:38 UTC following an increase in customer support inquiries. Because the issue affected merge commit correctness rather than availability, it was not detected by existing automated monitoring and was identified through customer reports.The regression was introduced by a new code path that adjusted merge base computation for merge queue ref updates. This code path was intended to be gated behind a feature flag for an unreleased feature, but the gating was incomplete.As a result, the new behavior was inadvertently applied to squash merge groups, producing an incorrect three-way merge. This caused subsequent squash merges to revert changes from earlier pull requests and, in some cases, changes between their starting points.We mitigated the incident by reverting the code change and force-deploying the fix across all environments. After resolution, we identified affected repositories and sent targeted remediation instructions to repository administrators with step-by-step recovery guidance.The regression was not identified during internal validation. Existing test coverage primarily exercised single-PR merge queue groups, which did not exhibit the faulty base-reference calculation. Because automated checks did not validate merge correctness for multi-PR squash groups, the defect surfaced only in production.To prevent recurrence, GitHub is expanding test coverage for merge correctness validation. We are broadening automated coverage for merge queue operations, including regression checks that validate resulting Git contents across supported configurations, so issues affecting merge correctness are caught before reaching production.We are committed to ensuring the correctness and reliability of merge queue operations. These actions will reduce the risk of similar regressions and improve confidence in future changes to the Pull Requests service.
We have resolved a regression present when using merge queue with either squash merges or rebases. If you use merge queue in this configuration, some pull requests may have been merged incorrectly between 2026-04-23 16:05-20:43 UTC. This behavior is still present in GitHub Enterprise Cloud with Data Residency, and we are rolling out the same fix.
Pull Requests is operating normally.
We have identified a regression in merge queue behavior present when squash merging or rebasing. We have identified the root-cause and are in the process of reverting the change.
We are investigating reports of degraded performance for Pull Requests
Disruption with users unable to start Claude and Codex agent task from the web
3 updates
Between 18:45 and 19:42 UTC on April 23, users were unable to start new agent tasks using either Claude or Codex agent on github.com. This was caused by a code change to how Copilot mission control routes task creation requests. Ongoing agent tasks and other Copilot agent features were not affected. We mitigated the impact by reverting the breaking change. We are adding extra monitoring and integration test coverage for the task creation path to prevent future recurrence.
We have identified the root cause of the issue and are working on mitigation.
We are investigating reports of impacted performance for some GitHub services.
Incident with multiple GitHub services
8 updates
On April 23, 2026, between 16:03 UTC and 17:27 UTC, multiple GitHub services experienced elevated error rates and degraded performance due to DNS resolution failures originating from our DNS infrastructure in our VA3 datacenter. Approximately 5–7% of overall traffic was affected during the impact window: - Webhooks: ~0.35% of API requests returned 5xx (peak ~0.39%). ~0.88% of requests exceeded 3s latency; at peak, >3s responses represented ~10% of Webhooks API traffic. - Copilot Metrics: ~9% of Copilot Insights dashboard requests returned 5xx. - Copilot cloud agents: ~10% of cloud agent sessions were affected and failing. - Octoshift: 0.88% of active repo migrations failed and 79% saw elevated durations (avg. 5.2 min) during this period. - Git Operations: averaged 1.25% errors over the duration of the incident, with a peak of 2.07% errors. - Actions: Workflow run status updates experienced delays of up to ~8s over the duration of the incident window. Our DNS infrastructure in VA3 entered a degraded state and began intermittently returning NXDOMAIN responses and timing out on lookups for both internal service discovery and external endpoints. This caused a cascading impact across the dependent services listed above. We identified a specific load pattern under which our DNS resolvers began failing. The evidence points to a recently introduced traffic-balancing mechanism, rolled out progressively to support our growth, as the root cause. We have since reverted this change. We are immediately prioritizing investments in a more controlled rollout and validation process, including a dedicated environment to safely shadow production DNS traffic and detect these failure modes before they can affect production.
Webhooks is operating normally.
Many services are mitigated and are validating the remaining services.
The degradation affecting Actions and Copilot has been mitigated. We are monitoring to ensure stability.
We have identified the root problem and are working on mitigation.
Actions is experiencing degraded performance. We are continuing to investigate.
We are investigating multiple unavailable services.
We are investigating reports of degraded availability for Copilot and Webhooks
Investigating errors on GitHub
8 updates
On April 23, 2026 between 14:30 UTC and 15:18 UTC multiple services were degraded on github.com. During this time approximately 1.5% of all web requests resulted in a 5xx status and unicorn pages for github.com users. We also saw elevated error rates across Actions workflow runs, Copilot, Codespaces and Packages, leading to degraded experiences during this timeframe. Codespaces impact peaked at 45% failures for create requests and 65% failures for resume requests. Packages impact was mainly Maven related with 50% failure rates in downloads and 70% failure rates in uploads. Actions experienced a peak of 8% of failed jobs and up to 85% of jobs impacted by run start delays of more than 5 minutes.This was due to a configuration change to an internal billing service that led to a cache being overwhelmed and causing requests to time out. These timeouts cascaded across multiple services and eventually caused requests to queue up and exhaust web request workers.This configuration change was reverted at 14:42 UTC and following this, all services began to see recovery immediately.To prevent this situation in the future, we are taking steps to ensure that failures and timeouts in the billing service don’t cascade to other services causing impact. This includes implementing more aggressive timeouts on callers of these billing services, adding circuit breaker configurations for cache timeouts and using more resilient cache options. We have also decreased max request timeouts within the billing service that caused impact and added more capacity to our cache to prevent traffic spikes from having the same impact.
The degradation affecting Actions, Codespaces, Copilot and Packages has been mitigated. We are monitoring to ensure stability.
A mitigation was applied and services have recovered. Actions is working through queued work before fully recovering.
Users are experiencing errors loading various web pages on github.com. Actions and Copilot Cloud Agent runs will be delayed.
Copilot is experiencing degraded performance. We are continuing to investigate.
Codespaces is experiencing degraded performance. We are continuing to investigate.
Packages is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Actions
Disruption with some GitHub services
6 updates
On April 22, 2026, between 09:00 UTC and 22:05 UTC, the Copilot coding agent and pull request comment event processing were degraded. During this period, approximately 0.5% of total pull request and issue comments mentioned @copilot (~23,000 invocations), explicitly requested work from the Copilot coding agent but were not acted upon.Creating, viewing, and replying to pull request comments was unaffected, and other Copilotfunctionality continued to operate normally. The impact was limited to @copilot mentions on pull request comments not triggering Copilot coding agent runs, and to some downstream systems not receiving new pull request comment events during the impact window.The cause was a serialization error that prevented pull request comment events from being published to downstream consumers, including the Copilot coding agent. This was related to the same class of issue as incident #4295 on April 20, affecting a another event type.We mitigated the incident by deploying a fix that restored event publishing, after which the Copilot coding agent and other downstream consumers resumed processing pull request comment events normally.We are working to complete our audit of related event schemas, migrate remaining consumers to usethe updated identifier fields, and improve monitoring to detect drops in publishing on critical event topics, to reduce our time to detection and mitigation of issues like this one in the future.
We have identified the root cause of the disruption affecting Copilot Coding Agent and Issues. A fix is being deployed.
We have identified the root cause of the disruption affecting Copilot Coding Agent and Issues. Copilot @-mentions on pull requests are not being processed, and some issue-related functionality may be degraded. A fix has been developed and is being applied.
Copilot @-mentions on pull requests are currently not being processed by Copilot Cloud Agent. We have found the issue and are investigating remediations.
Issues is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Disruption with Copilot chat and Copilot Coding Agent
8 updates
On April 22, 2026, between 15:16 UTC and 19:18 UTC, users experienced errors when interacting with Copilot Chat on github.com and Copilot Cloud Agent. During this time, users were unable to use Copilot Chat or Copilot Cloud Agent. Copilot Memory (in preview) was not available to Copilot agent sessions during this time. The issue was caused by an infrastructure configuration change that resulted in connectivity issues with our databases. The team identified the cause and restored connectivity to the database. Copilot Chat and Cloud Agent for github.com were restored by 18:16 UTC. Remaining regional deployments were restored incrementally, with full resolution at 19:18 UTC. We have taken steps to prevent similar infrastructure changes from causing these kinds of database operations in the future.
Copilot cloud agent and chat are mitigated for github.com.
We are now seeing recovery for Copilot cloud agent.
Mitigation is progressing for Copilot chat and cloud agent recovery.
Mitigation is progressing for Copilot chat and cloud agent.
We continue to work on mitigation for Copilot chat and cloud agent.
We are aware of users seeing errors interacting with Copilot chat on github.com and Copilot cloud agent. We have identified the cause and are investigating remediations.
We are investigating reports of impacted performance for some GitHub services.
Disruption with projects service
11 updates
On April 21, 2026, between 13:35 UTC and 01:24 UTC the following day the projects service was degraded. During this time period, projects may have been out of sync and users may have experienced delays in changes to projects and their items. Delays in reflected changes peaked at approximately 45 minutes. The delays were caused by serialization errors that failed events and triggered a flood of resyncs, overloading our event processing layers.We mitigated the incident by speeding up processing time for incoming changes and otherwise waiting for all changes to be processed.We are working to increase our capacity for processing updates to projects to reduce our time to mitigation of issues like this one in the future.
The issue remains mitigated. Users may still experience small delays in changes to projects while we process the backlog of events. We expect a full recovery in approximately two hours.
The issue remains mitigated. Users may still experience delays in changes to projects while we process the backlog of events. We expect a full recovery in approximately three hours.
The degradation has been mitigated. We are monitoring to ensure stability.
Recovery from the delays affecting GitHub Projects continues to progress. We have deployed additional mitigations that are accelerating processing of the backlog. Users may still experience delays where changes to projects are not reflected immediately. We expect full recovery within approximately six hours.
The queues are continuing to decrease and we are working to accelerate the rate of processing through the queues.
The mitigation is deployed and we are seeing recorvery in the queues and will provide an update as to when full recovery will be realized.
We are deploying a fix to relieve the queue of delayed data. Some users may still experience delays with GitHub Projects where changes are not reflected immediately as remaining backlogs are processed.
We continue to investigate delays with GitHub Projects where changes may not be reflected immediately. Our team has identified the cause and applied mitigations to address the issue. We are seeing initial signs of recovery, though some delays may persist as the system works through a backlog of pending updates.
We are investigating reports of delays with GitHub Projects. Users may notice that changes made to projects are not reflected immediately. Our team has identified the source of the delays and is actively working to resolve the issue.
We are investigating reports of impacted performance for some GitHub services.
Partial degradation for code scanning default setup and for code quality
15 updates
On April 20, 2026 between 10:28 UTC and 15:04 UTC GitHub experienced degraded service for code scanning default setup, code quality, and project boards. Repair of affected project boards additionally lasted until April 21, 05:04 UTC During this time, code scanning default setup and code quality analyses were not triggered on newly opened pull requests. Additionally, newly created issues were not appearing on project boards. The cause was a serialization error that prevented proper triggering of code scanning, code quality analyses, and project board updates. We mitigated the issue by deploying a fix, restoring event publishing for code scanning and code quality. For project boards, an additional code change was deployed to update event consumers, followed by a reindex of affected project items. We are working to prevent recurrence by strengthening our schema validations and improving monitoring for drops in publishing on critical hydro topics.
The degradation has been mitigated. We are monitoring to ensure stability.
The issue remains mitigated. Issues that were linked to projects during the incident may take approximately three more hours to render correctly while we complete a re-index.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation has been mitigated. We are monitoring to ensure stability.
The issue has been mitigated. Newly created issues linked to projects should now function as expected. Issues that were linked to projects during the incident may take approximately five hours to render correctly while we complete a re-index.
A deployment to fix this issue of new issues not showing up in projects is underway.
We continue to work on mitigation regarding new issues not showing on project boards.
We continue to work on mitigation regarding new issues not showing on project boards.
Code scanning default setup and Code Quality triggers are back up and running. PRs not processed before or during this incident will require a new push to trigger code scanning or code quality analysis.We are seeing problems with new issues not showing on project boards and are working on mitigation.
We are continuing to work on a mitigation to unblock code scanning default setup and code quality features on pull requests.
We are currently deploying mitigations that should unblock code scanning default setup and code quality features on pull requests.
We are actively working to mitigate an issue affecting code scanning default setup and code quality features on pull requests. Users may experience pull request code scanning and code quality analyses not being triggered on new pull requests. Our engineering team has identified the root cause and working on mitigating the issue.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
6 updates
On April 17, 2026, between 14:46 UTC and 15:12 UTC, users experienced a degraded web experience on GitHub.com. During this time, approximately 1.5% of web requests resulted in errors, with some users encountering slow page loads or failed requests. The issue was caused by capacity saturation of a caching component in one of our data center regions. We mitigated the issue by redirecting traffic to an unaffected region and rolling back a recent deployment. The incident was fully resolved at 15:18 UTC. We are taking steps to provide appropriate capacity for this caching path to prevent recurrence.
The degradation affecting Issues has been mitigated. We are monitoring to ensure stability.
We have isolated a problematic component in our infrastructure and are working to mitigate. We will continue to post updates as we work toward resolution.
We are experiencing an issue that impacts approximately 10% of traffic to the web, resulting in slow and failed calls. We are investigating and will continue to post updates as we work toward mitigation.
Issues is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of impacted performance for some GitHub services.
Incident with Codespaces
7 updates
On April 16, 2026 between 09:30 UTC and 17:15 UTC, users experienced failures when attempting to connect to GitHub Codespaces via the VS Code editor. During this time, approximately 40% of codespace start operations failed. Users connecting via SSH were not impacted. The issue was caused by a failure in an upstream download service that prevented the VS Code Server from being retrieved during codespace startup. The impact was mitigated by implementing a workaround to use an alternative download path when the primary endpoint is degraded. We are working with the upstream dependency to address the root cause of the download service failure, and we are improving our fallback mechanisms to reduce the impact of similar upstream failures in the future.
The degradation affecting Codespaces has been mitigated. We are monitoring to ensure stability.
Our provider is implementing a mitigation and we are seeing signs of recovery.
We found an issue that impacts 70% of Codespaces. We are engaged with the provider and working towards mitigation.
Codespaces is experiencing degraded availability. We are continuing to investigate.
We are experiencing degraded performance in Codespaces related to creating a new Codespace or starting an existing Codespace from the VS Code editor. SSH connections to Codespaces are not impacted. We are working toward mitigation and will continue to keep you updated on progress.
We are investigating reports of degraded performance for Codespaces
Disruption with some GitHub services
8 updates
On April 14, between 00:58 UTC and 06:08 UTC, GitHub Enterprise Cloud customers experienced 500 errors when attempting to access Copilot Insights pages which was caused by an authentication failure in our metrics pipeline. We fully mitigated the issue and validated the fix in production. Approximately 709 users were impacted. The total impact duration was approximately 5 hours and 10 minutes. Our investigation determined the incident was caused by a change in a tenant credential which caused authentication errors to retrieve the required data needed on our Copilot Insights pages. We understand this disruption impacted customers' ability to access the Copilot Insights page. To prevent similar issues and reduce resolution time in the future, we are investing in improved diagnostics tooling to quickly identify the root cause of failures, enhanced monitoring, and alerting to detect issues at a more granular level. GitHub is a critical infrastructure for your work, your teams, and your businesses. We are focused on these remediations and continued reliability improvements for Copilot Insights and related metrics experiences.
This incident has been resolved. We will continue to monitor to ensure stability. Thank you for your patience and understanding as we addressed this issue.
The degradation has been mitigated. We are monitoring to ensure stability.
We identified an issue that impacts the Copilot Dashboard on the Insights tab and are working on mitigation. We will continue to keep you updated on progress.
The team continues to investigate issues accessing with Copilot Dashboard on the Insights tab. We will continue providing updates on the progress towards mitigation.
The Copilot Dashboard on the Insights tab is not accessible and we are continuing to investigate.
Degradation of Service - Insights Page
We are investigating reports of impacted performance for some GitHub services.
Incident with Pages
5 updates
On Sunday April 13th, 2026, between 18:53 UTC and 20:30 UTC, the GitHub Pages service experienced elevated error rates. On average, the error rate was 10.58% and peaked at 12.77% of requests to the service, resulting in approximately 17.5 million failed requests returning HTTP 500 errors. This was due to an automated DNS management tool (octodns) erroneously deleting a DNS record for a Pages backend storage host after its upstream data source intermittently failed to return the record, causing the tool to treat it as stale and remove it.We mitigated the incident by re-creating the deleted DNS record. To prevent future incidents, we are implementing availability-zone-tolerant routing in the Pages frontend so that an unresolvable backend host triggers failover to healthy hosts rather than returning errors, adding safeguards to prevent automated deletion of DNS records owned by other systems, and improving logging and alerting for DNS resolution failures in the Pages serving path.
We have mitigated the issue with Pages.
The degradation affecting Pages has been mitigated. We are monitoring to ensure stability.
We are investigating reports of issues with Pages. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of degraded availability for Pages
Disruption with some GitHub services
3 updates
On April 13, 2026, between 14:41 UTC and 17:29 UTC, the Copilot service experienced degraded performance. All Copilot users were impacted by increased latency, and approximately 20% experienced request failures when interacting with Copilot Cloud Agent (CCA). On average, request latency increased to approximately 950ms. The GitHub User Dashboard also displayed intermittent errors loading Copilot quota information. CCA and the User Dashboard were impacted for approximately 2 hours and 56 minutes. This was due to an infrastructure change that reduced the available compute capacity for a backend service responsible for Copilot rate limiting and quota management. The reduced capacity caused resource exhaustion under normal traffic load, leading to cascading failures in downstream request processing. We mitigated the incident by increasing compute resources allocated to the affected service and scaling out the number of service instances to distribute load more effectively. We are working to improve proactive capacity monitoring to detect resource degradation before it impacts users, reviewing retry and timeout configurations across dependent services to reduce amplification during degraded states, and evaluating connection management strategies to improve resilience under constrained resources.
We have identified the root cause and are rolling out a fix for Copilot. The services should now be in recovery, with expected full recovery in 5 to 10 minutes.
We are investigating reports of impacted performance for some GitHub services.
Problems with third-party Claude and Codex Agent sessions not being listed in the agents tab dashboard
3 updates
On April 9, 2026, between 22:59 UTC and April 10, 2026, 13:24 UTC, the Copilot Mission Control service was degraded and did not display Claude and Codex Cloud Agent sessions in the agents tab dashboard. Customers were unable to see, list, or manage their third party agent sessions during this period. The underlying agent sessions continued to function normally. This was a visibility and management issue only, and no HTTP errors were generated. The API returned successful responses with incomplete results, with an average error rate of 0% and a maximum error rate of 0%. This was due to a code change that introduced a filter which inadvertently excluded third party agent sessions.We mitigated the incident by reverting the problematic code change and deploying the fix to production.We are working to add automated monitoring for dashboard content visibility and improve integration test coverage for third party agent session listing to reduce our time to detection and mitigation of issues like this one in the future.
We are investigating third party Claude and Codex Cloud Agent sessions not being listed in the agents tab dashboard.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
7 updates
On April 9, 2026, between 16:05 UTC and 20:36 UTC, the Copilot cloud agent service was degraded, causing new agent sessions to be delayed or fail to start. Users who attempted to start Copilot cloud agent sessions during this period experienced jobs getting stuck in the queue, with wait times peaking at 54 minutes compared to the normal 15–40 seconds. On average, approximately 84% of requests to start agent sessions failed, peaking at 97.5% during the worst period.This was due to an internal service exceeding API rate limits, compounded by a caching bug that persisted the rate-limited state beyond the actual rate limit window, causing recurring outage waves rather than a single recovery.We mitigated the incident by deploying a configuration change to bypass the affected cache and shifting API traffic to an alternative authentication path that reduced rate limit exposure. We have since added automated monitoring and alerting for this failure mode, deployed per-endpoint rate limit controls, and added caching for high-traffic API calls to reduce overall load. We are also working on longer-term improvements to rate limit isolation and traffic management to prevent similar issues in the future.This incident shared the same underlying root causes with an incident declared in the time frame https://www.githubstatus.com/incidents/zn1t56bfxdzg
We continue to investigate periodic delays in Copilot Cloud Agent job processing
We are continuing to investigate Copilot Cloud Agent job delays
Copilot Cloud Agent jobs are being processed and we are monitoring recovery
We are investigating delays processing Copilot Cloud Agent jobs
We are experiencing issues where jobs are being delayed to start for copilot coding agent
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
4 updates
On April 9, 2026, between 09:05 UTC and 19:05 UTC, the Copilot coding agent service was degraded and users experienced significant delays starting new agent sessions. Approximately 84% of new agent session requests were delayed across four separate outage waves, with queue wait times peaking at 54 minutes compared to a normal baseline of 15–40 seconds. On average, the error rate was 83.9% and peaked at 97.5% of requests to the service. Approximately 22,700 workflow creations were delayed or failed during the incident.This was due to a bug in our rate limiting logic that incorrectly applied a rate limit globally across all users, rather than scoping it to the individual installation that triggered the limit. A contributing factor was a surge in API traffic from a client update that increased requests to an internal endpoint by 3–4x, which accelerated rate limit exhaustion.We mitigated the incident by disabling the faulty rate limit caching mechanism via feature flag and updating our service to use per-installation credentials for API calls, ensuring rate limits are correctly scoped to individual installations.We have since added automated monitoring and alerting to detect this failure mode proactively, deployed fixes to reduce unnecessary API traffic through caching improvements, and are continuing work to further isolate rate limit scoping across client types to prevent similar issues in the future.This incident shared the same underlying root causes with an incident declared in the time frame https://www.githubstatus.com/incidents/2rqwxl8y7m0j
The degradation has been mitigated. We are monitoring to ensure stability.
We are investigating an issue affecting GitHub Copilot coding agent. Users may experience significant delays when starting new agent sessions, with jobs remaining queued longer than expected. Our team has identified increased load as a contributing factor and is actively working to restore normal performance.
We are investigating reports of impacted performance for some GitHub services.
Disruption with GitHub notifications
3 updates
On April 9, 2026, between 03:22 UTC and 04:49 UTC, GitHub Notifications experienced degraded availability. During this time, approximately 45% of requests to the notifications service returned errors, with a peak error rate of approximately 54%, preventing affected users from successfully viewing or interacting with their notifications service. The issue was identified and resolved, restoring the service to full availability.We are working to improve our metrics to reduce time to detection and mitigation for similar issues in the future.
The degradation has been mitigated. We are monitoring to ensure stability.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
7 updates
Between 15:20 and 20:18 UTC on Thursday April 2, Copilot Cloud Agent entered a period of reduced performance. Due to an internal feature being developed for Copilot Code Review, the Copilot Cloud Agent infrastructure started to receive an increased number of jobs. This load eventually caused us to hit an internal rate limit, causing all work to suspend for an hour. During this hour, some new jobs would time out, while others would resume once rate limiting ended. Roughly 40% of jobs in this period were affected.Once the cause of this rate limiting was identified, we were able to disable the new CCR feature via a feature flag. Once the jobs that were already in the queue were able to clear, we didn't see additional instances of rate limiting afterwards.
The degradation has been mitigated. We are monitoring to ensure stability.
Although we are observing recovery once again, we expect continued periods of degradation. Work that is queued during times of degradation does eventually get processed. We continue to investigate and find a mitigation, and will update again within 2 hours.
This issue has recurred. Customers will once again experience false job starts when assigning tasks to Copilot Cloud Agent. We are still investigating and trying to understand the pattern of degradation.
We are once again seeing recovery with Copilot Cloud Agent job starts. We are keeping this open while we verify this won't recur.
When assigning tasks to Copilot Cloud Agent, the task will appear to be working, but may not actually be running.We are investigating.
We are investigating reports of impacted performance for some GitHub services.
Copilot Coding Agent failing to start some jobs
3 updates
Between 15:20 and 20:18 UTC on Thursday April 2, Copilot Cloud Agent entered a period of reduced performance. Due to an internal feature being developed for Copilot Code Review, the Copilot Cloud Agent infrastructure started to receive an increased number of jobs. This load eventually caused us to hit an internal rate limit, causing all work to suspend for an hour. During this hour, some new jobs would time out, while others would resume once rate limiting ended. Roughly 40% of jobs in this period were affected.Once the cause of this rate limiting was identified, we were able to disable the new CCR feature via a feature flag. Once the jobs that were already in the queue were able to clear, we didn't see additional instances of rate limiting afterwards.This was the same incident declared in https://www.githubstatus.com/incidents/d96l71t3h63k
When assigning tasks to Copilot Cloud Agent, the task will appear to be working, but may not actually be running. We are investigating.
We are investigating reports of impacted performance for some GitHub services.
Disruption with GitHub's code search
7 updates
On April 1st, 2026 between 14:40 and 17:00 UTC the GitHub code search service had an outage which resulted in users being unable to perform searches.The issue was initially caused by an upgrade to the code search Kafka cluster ZooKeeper instances which caused a loss of quorum. This resulted in application-level data inconsistencies which required the index to be reset to a point in time before the loss of quorum occurred. Meanwhile, an accidental deploy resulted in query services losing their shard-to-host mappings, which are typically propagated by Kafka.We remediated the problem by performing rolling restarts in the Kafka cluster, allowing quorum to be reestablished. From there we were able to reset our index to a point in time before the inconsistencies occurred.The team is working on ways to improve our time to respond and mitigate issues relating to Kafka in the future.
Code search has recovered and is serving production traffic.
We have stabilized Code Search infrastructure, and are in the final stages of validation before slowly reintroducing production traffic.
We are still working on recovering back to a serviceable state and expect to have a more substantial update within another two hours.
We are observing some recovery for Code Search queries, but customers should be aware that the data being served may be stale, especially for changes that took place after 07:00 UTC today (1 April 2026). We are still working on recovering our ingestion pipeline, and synchronizing the indexed data.We will update again within 2 hours.
We identified an issue in our ingestion pipeline that degraded the freshness of Code Search results. While fixing the issue with the ingestion pipeline, a deployment caused a loss of dynamic configuration which is causing most requests for Code Search results to fail. We are working to restore the service and to re-ingest the misaligned data.
We are investigating reports of impacted performance for some GitHub services.
GitHub audit logs are unavailable
3 updates
On April 1, 2026, between 15:34 UTC and 16:02 UTC, our audit log service lost connectivity to its backing data store due to a failed credential rotation. During this 28-minute window, audit log history was unavailable via both the API and web UI. This resulted in 5xx errors for 4,297 API actors and 127 github.com users. Additionally, events created during this window were delayed by up to 29 minutes in github.com and event streaming. No audit log events were lost; all audit log events were ultimately written and streamed successfully. Customers using GitHub Enterprise Cloud with data residency were not impacted by this incident. We were alerted to the infrastructure failure at 15:40 UTC — six minutes after onset — and resolved the issue by recycling the affected environment, restoring full service by 16:02 UTC. We are conducting a thorough review of our credential rotation process to strengthen its resiliency and prevent recurrence. In parallel, we are strengthening our monitoring capabilities to ensure faster detection and earlier visibility into similar issues going forward.
A routine credential rotation has failed for our our audit logs service; we have re-deployed our service and are waiting for recovery.
We are investigating reports of impacted performance for some GitHub services.
Incident with Copilot
9 updates
On April 1, 2026, between 07:29 and 12:41 UTC, some customers experienced elevated 5xx errors and increased latency when using GitHub Copilot features that rely on `/agents/sessions` endpoints (including creating or viewing agent sessions). The issue was caused by resource exhaustion in one of the Copilot backend services handling these requests, in turn, causing timeouts and failed requests. We mitigated the incident by increasing the service’s available compute resources and tuning its runtime concurrency settings. Service health returned to normal and the incident was fully resolved by 12:41 UTC.
The success rate and latency for creating and viewing agent sessions has stabilized at baseline levels, we are continuing to monitor recovery
The degradation has been mitigated. We are monitoring to ensure stability.
The success rate for creating and viewing agent sessions has stabilized, and we're continuing to monitor latency, which is trending toward baseline levels.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Copilot has been mitigated. We are monitoring to ensure stability.
Users may see increased latency and intermittent errors when viewing or creating agent sessions. We are working on mitigations to return to baseline performance and success rate.
We are investigating reports of issues with service(s): Copilot Dotcom Agents. We will continue to keep users updated on progress towards mitigation.
We are investigating reports of degraded performance for Copilot
March 2026(7 incidents)
Incident with Pull Requests: High percentage of 500s
11 updates
On Monday March 31st, 2026, between 13:53 UTC and 21:23 UTC the Pull Requests service experienced elevated latency and failures. On average, the error rate was 0.15% and peaked at 0.28% of requests to the service. This was due to a change in garbage collection (GC) settings for a Go-based internal service that provides access to Git repository data. The changes caused more frequent GC activity and elevated CPU consumption on a subset of storage nodes, increasing latency and failure rates for some internal API operations.We mitigated the incident by reverting the GC changes. To prevent future incidents and improve time to detection and mitigation, we are instrumenting additional metrics and alerting for GC-related behavior, improving our visibility into other signals that could cause degraded impact of this type, and updating our best practices and standards for garbage collection in Go-based services.
The degradation affecting Pull Requests has been mitigated. We are monitoring to ensure stability.
We continue to see a small subset of repositories experiencing timeouts and elevated latency in Pull Requests, affecting under 1% of requests.
Error rates remain elevated across multiple pull request endpoints. We are pursuing multiple potential mitigations.
We continue to experience elevated error rates affecting Pull Requests. An earlier fix resolved one component of the issue, but some users may still encounter intermittent timeouts when viewing or interacting with pull requests. Our teams are actively investigating the remaining causes.
We identified an issue causing increased errors when accessing Pull Requests. The mitigation is being applied across our infrastructure and we will continue to provide updates as the mitigation rolls out.
We are seeing recovery in latency and timeouts of requests related to pull requests, even though 500s are still elevated. While we are continuing to investigate, we are applying a mitigation and expect further recovery after it is applied.
We are continuing to investigate increased 500 errors affecting GitHub services. You may experience intermittent failures when using Pull Requests and other features. We are actively working to identify and resolve the underlying cause.
We are investigating increased 500 errors affecting GitHub services. You may experience intermittent failures when using Pull Requests and other features. We are actively working to identify and resolve the underlying cause.
We are seeing a higher than average number of 500s due to timeouts across GitHub services. We have a potential mitigation in flight and are continuing to investigate.
We are investigating reports of degraded performance for Pull Requests
Issues with metered billing report generation
7 updates
On March 31, 2026, between 06:15 UTC and 15:30 UTC, the GitHub billing usage reports feature was degraded due to reduced server capacity. Customers requesting billing usage reports and loading the top usage by organization and repository on the billing overview and usage pages were impacted. The average error rate for usage report requests was 15%, peaking at 98% over an eight-minute window. For the billing pages, an average of 56% of requests failed to load the top usage cards. The root cause was an increase in billing usage report requests with large datasets, which exhausted the capacity of the nodes responsible for reporting data. There was no impact on billing charges. We mitigated the incident by adjusting our auto-scaling thresholds to better meet our capacity needs. We are working to improve our metrics to reduce time to detection and mitigation for similar issues in the future.
The degradation has been mitigated. We are monitoring to ensure stability.
We have applied mitigations to a data store related to billing reports, and are seeing partial recovery to billing report generation. We continue to monitor for full recovery.
We are seeing a high number of 500s due to timeouts across GitHub services. We are redeploying some of our core services and we expect that this allow us to recover.
We're continuing to see high failure rates on billing report generation, and are working on mitigations for a data store related to billing reports.
We're seeing issues related to metered billing reports, intermittently affecting metered usage graphs and reports on the billing page. We have identified an issue with a data store, and are working on mitigations.
We are investigating reports of impacted performance for some GitHub services.
Elevated delays in Actions workflow runs and Pull Request status updates
4 updates
On March 30, 2026, between 10:11 UTC and 13:25 UTC, GitHub Actions experienced degraded performance. During this time, approximately 2.65% of workflow jobs triggered by pull request events experienced start delays exceeding 5 minutes. The issue was caused by replication lag on an internal database cluster used by Actions, which triggered write throttling in our database protection layer and slowed job queue processing. The replication lag originated from planned maintenance to scale the internal database. Newly added database hosts triggered guardrails in the throttling layer, restricting write throughput. The incident was mitigated by excluding the new hosts from replication delay calculations. To prevent recurrence, we have updated our maintenance procedures to ensure new hosts are excluded from throttling assessments during scaling operations. Additionally, we are investing in automation to streamline this type of maintenance activity.
The degradation has been mitigated. We are monitoring to ensure stability.
The degradation affecting Actions and Pull Requests has been mitigated. We are monitoring to ensure stability.
We are investigating reports of degraded performance for Actions and Pull Requests
Incident with Copilot
1 update
On March 27, 2026, from 02:30 to 04:56 UTC, a misconfiguration in our rate limiting system caused users on Copilot Free, Student, Pro, and Pro+ plans to experience unexpected rate limit errors. The configuration that was incorrectly applied was intended solely for internal staff testing of rate-limiting experiences. Copilot Business and Copilot Enterprise accounts were not affected. During this period, affected users received error messages instructing them to retry after a certain time. Approximately 32% of active Free users, 35% of active Student users, 46% of active Pro users, and 66% of active Pro+ users were affected. After identifying the root cause, we reverted the change and restored the expected rate limits. We are reviewing our deployment and validation processes to help ensure configurations used for internal testing cannot be inadvertently applied to production environments.
Disruption with some GitHub services
6 updates
This incident has been resolved. Thank you for your patience and understanding as we addressed this issue. A detailed root cause analysis will be shared as soon as it is available.
We are investigating elevated error rates affecting multiple GitHub services including Actions, Issues, Pull Requests, Webhooks, Codespaces, and login functionality. Some users may have experienced errors when accessing these features. Most services are now showing signs of recovery. We'll post another update by 21:00 UTC.
Issues is experiencing degraded performance. We are continuing to investigate.
Pull Requests is experiencing degraded performance. We are continuing to investigate.
Webhooks is experiencing degraded performance. We are continuing to investigate.
We are investigating reports of degraded performance for Actions
Teams Github Notifications App is down
5 updates
On March 24, 2026, between 15:57 UTC and 19:51 UTC, the Microsoft Teams Integration and Teams Copilot Integration services were degraded and unable to deliver GitHub event notifications to Microsoft Teams. On average, the error rate was 37.4% and peaked at 90.1% of requests to the service -- approximately 19% of all integration installs failed to receive GitHub-to-Teams notifications in this time period.This was due to an outage at one of our upstream dependencies, which caused HTTP 500 errors and connection resets for our Teams integration.We coordinated with the relevant service teams, and the issue was resolved at 19:51 UTC when the upstream incident was mitigated.We are working to update observability and runbooks to reduce time to mitigation for issues like this in the future.
We are experiencing degraded availability from Azure Teams APIs, which is impacting notifications from GitHub to Microsoft Teams. We are awaiting resolution from Azure.
We are experiencing degraded availability from Azure APIs, which is impacting notifications from GitHub to Microsoft Teams. We are working with Azure to resolve the issue.
We found an issue impacting notifications from GitHub to Microsoft Teams. We are working on mitigation and will keep users updated on progress towards mitigation.
We are investigating reports of impacted performance for some GitHub services.
Disruption with some GitHub services
3 updates
On March 22, 2026, between 09:05 UTC and 10:02 UTC, users may have experienced intermittent errors and increased latency when performing Git http read operations. On average, the error rate was 3.84% and peaked at 15.55% of requests to the service. The issue was caused by elevated latency in an internal authentication service within one of our regional clusters. We mitigated the issue by redirecting traffic away from the affected cluster at 09:39 UTC, after which error rates returned to normal. The incident was fully resolved at 10:02 UTC. We are working to scale the authentication service and reduce our time to detection and mitigation of issues like this one in the future.
We are investigating intermittently high latency and errors from Git operations.
We are investigating reports of impacted performance for some GitHub services.
📡 Tired of checking GitHub status manually?
Better Stack monitors uptime every 30 seconds and alerts you instantly when GitHub goes down.