GitHub Actions Monitoring Guide 2026
How to track GitHub Actions workflow failures, build times, costs, and reliability — with real alerting so you know when CI breaks before your team does.
TL;DR
- →GitHub Actions platform status lives at
githubstatus.com— it goes down more than you'd expect - →Three things to monitor: workflow failures, build time trends, and minute consumption
- →Alert on consecutive main branch failures, P95 build time spikes, and 80%+ budget usage
- →Use the
workflow_runevent + Slack notification step for free team-wide failure alerts
Why GitHub Actions Monitoring Matters
GitHub Actions is the default CI/CD platform for millions of repositories — handling everything from test suites to production deployments. When it breaks, the damage ripples: deployments halt, PRs can't merge, and teams scramble to diagnose whether it's their code or GitHub's infrastructure that's the problem.
GitHub Actions appeared in 47% of GitHub's major incidents in 2025 — more than any other GitHub service. Meanwhile, most teams have zero monitoring beyond GitHub's default email notifications (which go to the person who triggered the workflow, not the on-call engineer).
Without proper monitoring:
- ✗A broken deployment workflow silently fails 3× while the team is in sprint planning
- ✗Build times creep from 4 minutes to 18 minutes over 6 months — nobody notices until it's painful
- ✗GitHub Actions minutes spike 300% when a misconfigured matrix job runs thousands of times
- ✗A GitHub platform incident breaks all your workflows for 45 minutes before you know about it
Monitoring GitHub Actions Platform Status
Before debugging your workflow code, always check if GitHub itself is the problem. GitHub Actions is a cloud service that experiences its own outages and degradations.
GitHub Status Page
githubstatus.comCovers: All GitHub services including Actions, Packages, API, Pages
GitHub Status API
githubstatus.com/api/v2/status.jsonCovers: Programmatic access to current GitHub status
API Status Check
apistatuscheck.com/api/githubCovers: GitHub Actions uptime + incident history + instant alerts
6 GitHub Actions Metrics to Monitor
These are the metrics that matter for CI/CD reliability, with concrete alert thresholds:
Workflow Success Rate
≥ 95% (main branch)Percentage of workflow runs that complete successfully. The core reliability metric.
⚠ Alert on 2+ consecutive main branch failures
Mean Build Time (P50/P95)
P50 ≤ your baseline; P95 ≤ 2× P50Median and 95th percentile build duration. P95 spikes predict failures and infra issues.
⚠ Alert when P95 exceeds 2× your 7-day baseline
Queue Wait Time
< 30 secondsTime from workflow trigger to runner pickup. Long queues mean runner capacity issues.
⚠ Alert when queue wait exceeds 2 minutes
Minute Consumption (billing)
< 80% of included minutesGitHub Actions minutes used vs. your plan's included minutes for private repos.
⚠ Alert at 80% of monthly budget to avoid overage charges
Flaky Test Rate
< 2% of test runsPercentage of test failures that pass on retry. Flaky tests mask real failures.
⚠ Track per-test flakiness and quarantine tests exceeding 5% flaky rate
GitHub Actions Uptime
99.9% SLA targetGitHub Actions platform availability from external monitoring. GitHub targets but doesn't guarantee SLAs.
⚠ Alert when GitHub Actions API returns non-200 for > 2 minutes
Setting Up Workflow Failure Alerts
GitHub's default failure notifications go to the person who triggered the workflow — not to your team's alert channel. Here's how to fix that.
Method 1: Slack notification on failure (free)
# Add to any workflow that needs failure alerts
# Set SLACK_WEBHOOK_URL in repo Secrets
jobs:
build:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run tests
run: npm test
- name: Notify Slack on failure
if: failure()
uses: slackapi/slack-github-action@v2
with:
webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
webhook-type: incoming-webhook
payload: |
{
"text": "❌ GitHub Actions failed",
"attachments": [{
"color": "danger",
"fields": [
{ "title": "Repo", "value": "${{ github.repository }}", "short": true },
{ "title": "Branch", "value": "${{ github.ref_name }}", "short": true },
{ "title": "Workflow", "value": "${{ github.workflow }}", "short": true },
{ "title": "Run", "value": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}", "short": false }
]
}]
}Method 2: Reusable notification workflow
For teams with many repositories, create a centralized notification workflow that other workflows can call via workflow_call:
# .github/workflows/notify-failure.yml (in your shared org repo)
name: Notify on Failure
on:
workflow_call:
inputs:
workflow_name:
required: true
type: string
secrets:
SLACK_WEBHOOK_URL:
required: true
jobs:
notify:
runs-on: ubuntu-latest
steps:
- uses: slackapi/slack-github-action@v2
with:
webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
webhook-type: incoming-webhook
payload: |
{ "text": "❌ ${{ inputs.workflow_name }} failed in ${{ github.repository }}" }
# --- In your workflow that needs alerts ---
jobs:
on-failure:
if: failure()
needs: [build, test, deploy] # list your jobs here
uses: your-org/shared-workflows/.github/workflows/notify-failure.yml@main
with:
workflow_name: "Production Deploy"
secrets: inheritMethod 3: PagerDuty alert for production deployments
- name: Trigger PagerDuty on deploy failure
if: failure() && github.ref == 'refs/heads/main'
run: |
curl -X POST https://events.pagerduty.com/v2/enqueue \
-H "Content-Type: application/json" \
-d '{
"routing_key": "${{ secrets.PAGERDUTY_INTEGRATION_KEY }}",
"event_action": "trigger",
"payload": {
"summary": "Production deploy failed: ${{ github.workflow }}",
"severity": "critical",
"source": "github-actions",
"custom_details": {
"repository": "${{ github.repository }}",
"run_url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
}
}
}'Build Time Monitoring with the GitHub API
Build time creep is one of the most common CI performance issues — it's usually caused by growing test suites, larger Docker images, or uncached dependencies. Track it before it becomes critical.
#!/bin/bash
# Get average build time for last 20 runs of a workflow
OWNER="your-org"
REPO="your-repo"
WORKFLOW="ci.yml"
GH_TOKEN="ghp_your_token"
curl -s \
-H "Authorization: Bearer $GH_TOKEN" \
-H "Accept: application/vnd.github+json" \
"https://api.github.com/repos/$OWNER/$REPO/actions/workflows/$WORKFLOW/runs?per_page=20&status=success" \
| jq '[.workflow_runs[] |
(.updated_at | fromdateiso8601) - (.created_at | fromdateiso8601)
] |
{
avg_minutes: (add / length / 60 | round),
p95_minutes: (sort | .[-1] / 60 | round),
count: length
}'// Run this as a scheduled job (GitHub Actions cron: daily)
const { Octokit } = require('@octokit/rest');
const octokit = new Octokit({ auth: process.env.GH_TOKEN });
async function trackBuildTimes() {
const { data } = await octokit.rest.actions.listWorkflowRuns({
owner: process.env.OWNER,
repo: process.env.REPO,
workflow_id: 'ci.yml',
per_page: 50,
status: 'completed',
});
const durations = data.workflow_runs
.filter(r => r.conclusion === 'success')
.map(r => (new Date(r.updated_at) - new Date(r.created_at)) / 1000 / 60); // minutes
const avg = durations.reduce((a, b) => a + b, 0) / durations.length;
const p95 = durations.sort((a, b) => a - b)[Math.floor(durations.length * 0.95)];
// Post to Better Stack Logs or your observability stack
await fetch(process.env.BETTERSTACK_SOURCE_TOKEN_URL, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
dt: new Date().toISOString(),
workflow: 'ci.yml',
avg_build_minutes: Math.round(avg * 100) / 100,
p95_build_minutes: Math.round(p95 * 100) / 100,
sample_count: durations.length,
}),
});
}
trackBuildTimes();Alert Pro
14-day free trialStop checking — get alerted instantly
Next time GitHub Actions goes down, you'll know in under 60 seconds — not when your users start complaining.
- Email alerts for GitHub Actions + 9 more APIs
- $0 due today for trial
- Cancel anytime — $9/mo after trial
Monitoring GitHub Actions Costs
GitHub Actions charges by the minute for private repos (Linux: $0.008/min, Windows: $0.016/min, macOS: $0.08/min). Each plan includes free minutes (Free: 2,000/mo, Pro: 3,000/mo, Team: 3,000/mo). Overages bill automatically.
Check billing usage via GitHub API
# Get GitHub Actions billing usage for a user/org
# User account:
curl -H "Authorization: Bearer $GH_TOKEN" \
"https://api.github.com/users/{username}/settings/billing/actions"
# Organization:
curl -H "Authorization: Bearer $GH_TOKEN" \
"https://api.github.com/orgs/{org}/settings/billing/actions"
# Response:
# {
# "total_minutes_used": 305,
# "total_paid_minutes_used": 0,
# "included_minutes": 3000,
# "minutes_used_breakdown": {
# "UBUNTU": 205, "MACOS": 10, "WINDOWS": 90
# }
# }Matrix explosion
A matrix with 5 OS × 4 Node versions × 3 environments = 60 parallel jobs. At 10 min/run, that's 600 minutes per trigger. Gate matrix runs on main/release branches only.
Uncached dependencies
npm install or pip install on every run adds 2–5 minutes per job. Use actions/cache with lock file hash. This alone can cut build times by 30–50%.
Large Docker images
Pulling a 2GB Docker image on every job run is expensive in time and sometimes in egress costs. Use a minimal base image or cache layers with docker/build-push-action.
Always-on scheduled workflows
Cron-scheduled workflows (schedule:) run even when there are no new commits. Add a step that checks if HEAD changed since the last run and exits early if not.
GitHub Actions Monitoring Best Practices
Separate prod and dev workflows
Deploy workflows that touch production should always alert on failure, route to PagerDuty, and require an approval gate. Dev/feature branch failures can use lighter-weight notifications.
Use concurrency groups to prevent pileups
Add concurrency: group: ${{ github.ref }} to your deploy workflows. This cancels in-progress runs when a new push arrives, preventing queue backlog and wasted minutes.
Monitor the GitHub platform separately
Your CI failures may not be your fault. Have external monitoring for GitHub itself (API Status Check, githubstatus.com RSS feed) so you can instantly distinguish "our code broke" from "GitHub is down".
Set workflow timeouts
Add timeout-minutes: 30 to jobs and timeout-minutes: 45 to the workflow. Without timeouts, hung jobs can run for 6 hours, burning minutes and blocking PRs.
Track flaky tests separately
Use a rerun-failed-checks action or re-run-all-failed-jobs to detect and track flakiness. Tests that fail once then pass on rerun are flaky — quarantine them before they erode trust in CI.
Archive test results as artifacts
Upload JUnit XML reports with actions/upload-artifact. Tools like Datadog Test Visibility or GitHub's native test annotations give you per-test failure trends and flakiness dashboards.
Related Guides
Frequently Asked Questions
How do I get notified when a GitHub Actions workflow fails?
GitHub sends email notifications by default when a workflow you triggered fails. For team-wide Slack/Teams alerts, add a notification step to your workflow using the slackapi/slack-github-action or configure the GitHub app for Slack. For webhook-based alerts, use workflow_run events or a dedicated monitoring service like Better Stack that integrates with GitHub Actions via the API.
How do I track GitHub Actions costs?
GitHub Actions charges by the minute for private repos, with free minutes for public repos and included minutes per plan. Monitor costs in Settings → Billing → GitHub Actions. Use the GitHub API (GET /repos/{owner}/{repo}/actions/billing/usage) to programmatically track minute consumption. Enable per-workflow budget alerts in GitHub organization settings to get notified before overages.
What is a good GitHub Actions failure rate?
A healthy CI pipeline should have a failure rate below 10% on the main branch. Feature branch failures are expected (developers push broken code intentionally to test). Focus on main/release branch reliability: target 95%+ success rate. Alert immediately on consecutive main branch failures (2+ in a row usually indicates a broken merge). Track MTTR (mean time to recovery) — aim for under 30 minutes.
How do I monitor GitHub Actions build times?
Use the GitHub Actions API (GET /repos/{owner}/{repo}/actions/runs) to pull run duration data. For real-time dashboards, GitHub's built-in insights tab shows workflow duration trends. Third-party tools like Datadog's GitHub Actions integration, Grafana's GitHub plugin, or Better Stack can create automated dashboards. Alert when build time exceeds your baseline by 50% — slow builds often precede failures.
Is GitHub Actions down right now?
Check githubstatus.com for real-time GitHub Actions service status. GitHub Actions is one of the most frequently affected services during GitHub incidents — it appears in roughly 40% of GitHub outage reports. API Status Check tracks GitHub uptime at apistatuscheck.com/api/github and sends instant alerts when GitHub Actions is degraded or down.
📡 Monitor your APIs — know when they go down before your users do
Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.
Affiliate link — we may earn a commission at no extra cost to you