CI/CD MonitoringUpdated May 2026

GitHub Actions Monitoring Guide 2026

How to track GitHub Actions workflow failures, build times, costs, and reliability — with real alerting so you know when CI breaks before your team does.

TL;DR

  • GitHub Actions platform status lives at githubstatus.com — it goes down more than you'd expect
  • Three things to monitor: workflow failures, build time trends, and minute consumption
  • Alert on consecutive main branch failures, P95 build time spikes, and 80%+ budget usage
  • Use the workflow_run event + Slack notification step for free team-wide failure alerts

Why GitHub Actions Monitoring Matters

GitHub Actions is the default CI/CD platform for millions of repositories — handling everything from test suites to production deployments. When it breaks, the damage ripples: deployments halt, PRs can't merge, and teams scramble to diagnose whether it's their code or GitHub's infrastructure that's the problem.

GitHub Actions appeared in 47% of GitHub's major incidents in 2025 — more than any other GitHub service. Meanwhile, most teams have zero monitoring beyond GitHub's default email notifications (which go to the person who triggered the workflow, not the on-call engineer).

Without proper monitoring:

  • A broken deployment workflow silently fails 3× while the team is in sprint planning
  • Build times creep from 4 minutes to 18 minutes over 6 months — nobody notices until it's painful
  • GitHub Actions minutes spike 300% when a misconfigured matrix job runs thousands of times
  • A GitHub platform incident breaks all your workflows for 45 minutes before you know about it
📡
Recommended

Monitor your services before your users notice

Try Better Stack Free →

Monitoring GitHub Actions Platform Status

Before debugging your workflow code, always check if GitHub itself is the problem. GitHub Actions is a cloud service that experiences its own outages and degradations.

GitHub Status Page

githubstatus.com

Covers: All GitHub services including Actions, Packages, API, Pages

Official — GitHub posts incidents and maintenance windows hereOften lags the actual incident by 10–20 minutes; doesn't have granular component data

GitHub Status API

githubstatus.com/api/v2/status.json

Covers: Programmatic access to current GitHub status

Machine-readable JSON — can integrate into dashboards or alertingOnly shows top-level status; GitHub is slow to update component-level data

API Status Check

apistatuscheck.com/api/github

Covers: GitHub Actions uptime + incident history + instant alerts

Third-party 60-second monitoring + email/Slack alerts + 90-day historyThird-party — synthesized from GitHub's public signals

6 GitHub Actions Metrics to Monitor

These are the metrics that matter for CI/CD reliability, with concrete alert thresholds:

Workflow Success Rate

≥ 95% (main branch)

Percentage of workflow runs that complete successfully. The core reliability metric.

Alert on 2+ consecutive main branch failures

Mean Build Time (P50/P95)

P50 ≤ your baseline; P95 ≤ 2× P50

Median and 95th percentile build duration. P95 spikes predict failures and infra issues.

Alert when P95 exceeds 2× your 7-day baseline

Queue Wait Time

< 30 seconds

Time from workflow trigger to runner pickup. Long queues mean runner capacity issues.

Alert when queue wait exceeds 2 minutes

Minute Consumption (billing)

< 80% of included minutes

GitHub Actions minutes used vs. your plan's included minutes for private repos.

Alert at 80% of monthly budget to avoid overage charges

Flaky Test Rate

< 2% of test runs

Percentage of test failures that pass on retry. Flaky tests mask real failures.

Track per-test flakiness and quarantine tests exceeding 5% flaky rate

GitHub Actions Uptime

99.9% SLA target

GitHub Actions platform availability from external monitoring. GitHub targets but doesn't guarantee SLAs.

Alert when GitHub Actions API returns non-200 for > 2 minutes

Setting Up Workflow Failure Alerts

GitHub's default failure notifications go to the person who triggered the workflow — not to your team's alert channel. Here's how to fix that.

Method 1: Slack notification on failure (free)

GitHub Actions YAML
# Add to any workflow that needs failure alerts
# Set SLACK_WEBHOOK_URL in repo Secrets
jobs:
  build:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run tests
        run: npm test

      - name: Notify Slack on failure
        if: failure()
        uses: slackapi/slack-github-action@v2
        with:
          webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
          webhook-type: incoming-webhook
          payload: |
            {
              "text": "❌ GitHub Actions failed",
              "attachments": [{
                "color": "danger",
                "fields": [
                  { "title": "Repo", "value": "${{ github.repository }}", "short": true },
                  { "title": "Branch", "value": "${{ github.ref_name }}", "short": true },
                  { "title": "Workflow", "value": "${{ github.workflow }}", "short": true },
                  { "title": "Run", "value": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}", "short": false }
                ]
              }]
            }

Method 2: Reusable notification workflow

For teams with many repositories, create a centralized notification workflow that other workflows can call via workflow_call:

# .github/workflows/notify-failure.yml (in your shared org repo)
name: Notify on Failure
on:
  workflow_call:
    inputs:
      workflow_name:
        required: true
        type: string
    secrets:
      SLACK_WEBHOOK_URL:
        required: true

jobs:
  notify:
    runs-on: ubuntu-latest
    steps:
      - uses: slackapi/slack-github-action@v2
        with:
          webhook: ${{ secrets.SLACK_WEBHOOK_URL }}
          webhook-type: incoming-webhook
          payload: |
            { "text": "❌ ${{ inputs.workflow_name }} failed in ${{ github.repository }}" }

# --- In your workflow that needs alerts ---
jobs:
  on-failure:
    if: failure()
    needs: [build, test, deploy]  # list your jobs here
    uses: your-org/shared-workflows/.github/workflows/notify-failure.yml@main
    with:
      workflow_name: "Production Deploy"
    secrets: inherit

Method 3: PagerDuty alert for production deployments

- name: Trigger PagerDuty on deploy failure
  if: failure() && github.ref == 'refs/heads/main'
  run: |
    curl -X POST https://events.pagerduty.com/v2/enqueue \
      -H "Content-Type: application/json" \
      -d '{
        "routing_key": "${{ secrets.PAGERDUTY_INTEGRATION_KEY }}",
        "event_action": "trigger",
        "payload": {
          "summary": "Production deploy failed: ${{ github.workflow }}",
          "severity": "critical",
          "source": "github-actions",
          "custom_details": {
            "repository": "${{ github.repository }}",
            "run_url": "${{ github.server_url }}/${{ github.repository }}/actions/runs/${{ github.run_id }}"
          }
        }
      }'

Build Time Monitoring with the GitHub API

Build time creep is one of the most common CI performance issues — it's usually caused by growing test suites, larger Docker images, or uncached dependencies. Track it before it becomes critical.

Bash — fetch recent build times via GitHub API
#!/bin/bash
# Get average build time for last 20 runs of a workflow
OWNER="your-org"
REPO="your-repo"
WORKFLOW="ci.yml"
GH_TOKEN="ghp_your_token"

curl -s \
  -H "Authorization: Bearer $GH_TOKEN" \
  -H "Accept: application/vnd.github+json" \
  "https://api.github.com/repos/$OWNER/$REPO/actions/workflows/$WORKFLOW/runs?per_page=20&status=success" \
  | jq '[.workflow_runs[] |
      (.updated_at | fromdateiso8601) - (.created_at | fromdateiso8601)
    ] |
    {
      avg_minutes: (add / length / 60 | round),
      p95_minutes: (sort | .[-1] / 60 | round),
      count: length
    }'
Node.js — post build time to Better Stack Logs
// Run this as a scheduled job (GitHub Actions cron: daily)
const { Octokit } = require('@octokit/rest');
const octokit = new Octokit({ auth: process.env.GH_TOKEN });

async function trackBuildTimes() {
  const { data } = await octokit.rest.actions.listWorkflowRuns({
    owner: process.env.OWNER,
    repo: process.env.REPO,
    workflow_id: 'ci.yml',
    per_page: 50,
    status: 'completed',
  });

  const durations = data.workflow_runs
    .filter(r => r.conclusion === 'success')
    .map(r => (new Date(r.updated_at) - new Date(r.created_at)) / 1000 / 60); // minutes

  const avg = durations.reduce((a, b) => a + b, 0) / durations.length;
  const p95 = durations.sort((a, b) => a - b)[Math.floor(durations.length * 0.95)];

  // Post to Better Stack Logs or your observability stack
  await fetch(process.env.BETTERSTACK_SOURCE_TOKEN_URL, {
    method: 'POST',
    headers: { 'Content-Type': 'application/json' },
    body: JSON.stringify({
      dt: new Date().toISOString(),
      workflow: 'ci.yml',
      avg_build_minutes: Math.round(avg * 100) / 100,
      p95_build_minutes: Math.round(p95 * 100) / 100,
      sample_count: durations.length,
    }),
  });
}

trackBuildTimes();

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time GitHub Actions goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for GitHub Actions + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Monitoring GitHub Actions Costs

GitHub Actions charges by the minute for private repos (Linux: $0.008/min, Windows: $0.016/min, macOS: $0.08/min). Each plan includes free minutes (Free: 2,000/mo, Pro: 3,000/mo, Team: 3,000/mo). Overages bill automatically.

Check billing usage via GitHub API

# Get GitHub Actions billing usage for a user/org
# User account:
curl -H "Authorization: Bearer $GH_TOKEN" \
  "https://api.github.com/users/{username}/settings/billing/actions"

# Organization:
curl -H "Authorization: Bearer $GH_TOKEN" \
  "https://api.github.com/orgs/{org}/settings/billing/actions"

# Response:
# {
#   "total_minutes_used": 305,
#   "total_paid_minutes_used": 0,
#   "included_minutes": 3000,
#   "minutes_used_breakdown": {
#     "UBUNTU": 205, "MACOS": 10, "WINDOWS": 90
#   }
# }
high cost risk

Matrix explosion

A matrix with 5 OS × 4 Node versions × 3 environments = 60 parallel jobs. At 10 min/run, that's 600 minutes per trigger. Gate matrix runs on main/release branches only.

medium cost risk

Uncached dependencies

npm install or pip install on every run adds 2–5 minutes per job. Use actions/cache with lock file hash. This alone can cut build times by 30–50%.

medium cost risk

Large Docker images

Pulling a 2GB Docker image on every job run is expensive in time and sometimes in egress costs. Use a minimal base image or cache layers with docker/build-push-action.

low cost risk

Always-on scheduled workflows

Cron-scheduled workflows (schedule:) run even when there are no new commits. Add a step that checks if HEAD changed since the last run and exits early if not.

GitHub Actions Monitoring Best Practices

Separate prod and dev workflows

Deploy workflows that touch production should always alert on failure, route to PagerDuty, and require an approval gate. Dev/feature branch failures can use lighter-weight notifications.

Use concurrency groups to prevent pileups

Add concurrency: group: ${{ github.ref }} to your deploy workflows. This cancels in-progress runs when a new push arrives, preventing queue backlog and wasted minutes.

Monitor the GitHub platform separately

Your CI failures may not be your fault. Have external monitoring for GitHub itself (API Status Check, githubstatus.com RSS feed) so you can instantly distinguish "our code broke" from "GitHub is down".

Set workflow timeouts

Add timeout-minutes: 30 to jobs and timeout-minutes: 45 to the workflow. Without timeouts, hung jobs can run for 6 hours, burning minutes and blocking PRs.

Track flaky tests separately

Use a rerun-failed-checks action or re-run-all-failed-jobs to detect and track flakiness. Tests that fail once then pass on rerun are flaky — quarantine them before they erode trust in CI.

Archive test results as artifacts

Upload JUnit XML reports with actions/upload-artifact. Tools like Datadog Test Visibility or GitHub's native test annotations give you per-test failure trends and flakiness dashboards.

Related Guides

Frequently Asked Questions

How do I get notified when a GitHub Actions workflow fails?

GitHub sends email notifications by default when a workflow you triggered fails. For team-wide Slack/Teams alerts, add a notification step to your workflow using the slackapi/slack-github-action or configure the GitHub app for Slack. For webhook-based alerts, use workflow_run events or a dedicated monitoring service like Better Stack that integrates with GitHub Actions via the API.

How do I track GitHub Actions costs?

GitHub Actions charges by the minute for private repos, with free minutes for public repos and included minutes per plan. Monitor costs in Settings → Billing → GitHub Actions. Use the GitHub API (GET /repos/{owner}/{repo}/actions/billing/usage) to programmatically track minute consumption. Enable per-workflow budget alerts in GitHub organization settings to get notified before overages.

What is a good GitHub Actions failure rate?

A healthy CI pipeline should have a failure rate below 10% on the main branch. Feature branch failures are expected (developers push broken code intentionally to test). Focus on main/release branch reliability: target 95%+ success rate. Alert immediately on consecutive main branch failures (2+ in a row usually indicates a broken merge). Track MTTR (mean time to recovery) — aim for under 30 minutes.

How do I monitor GitHub Actions build times?

Use the GitHub Actions API (GET /repos/{owner}/{repo}/actions/runs) to pull run duration data. For real-time dashboards, GitHub's built-in insights tab shows workflow duration trends. Third-party tools like Datadog's GitHub Actions integration, Grafana's GitHub plugin, or Better Stack can create automated dashboards. Alert when build time exceeds your baseline by 50% — slow builds often precede failures.

Is GitHub Actions down right now?

Check githubstatus.com for real-time GitHub Actions service status. GitHub Actions is one of the most frequently affected services during GitHub incidents — it appears in roughly 40% of GitHub outage reports. API Status Check tracks GitHub uptime at apistatuscheck.com/api/github and sends instant alerts when GitHub Actions is degraded or down.

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you