LLM MonitoringUpdated May 2026

OpenAI API Monitoring Guide 2026

How to monitor the OpenAI API in production — status tracking, rate limit tiers, error decoding, and automated alerts for ChatGPT, GPT-4o, and o-series model outages.

TL;DR

  • Monitor api.openai.com/v1/models every 60s — cheapest availability check that exercises auth
  • OpenAI's status page (status.openai.com) lags incidents by 15–30 min — proactive monitoring is essential
  • 429 errors = rate limit exceeded — implement exponential backoff with jitter; check tier limits
  • 500/503 errors = OpenAI-side issue — retry with backoff or fail over to Claude/Gemini
  • Upgrade tiers by spending — Tier 1 requires $5 paid, Tier 5 requires $1,000+ spend

Why OpenAI API Monitoring Matters

The OpenAI API powers more production applications than any other AI platform — from customer-facing chatbots to internal code generation tools to document processing pipelines. GPT-4o, o3, and the embeddings API are critical infrastructure for thousands of businesses in 2026.

But OpenAI experiences real outages. In 2025 and early 2026, the API had several major incidents with degraded performance lasting 30–90 minutes. OpenAI's status page is reactive — incidents are often declared 15–30 minutes after users first notice problems. Without proactive monitoring:

  • Timeouts and 500 errors pile up in logs with no alert firing
  • Users see blank responses or infinite loading states
  • Your application queues thousands of retries, amplifying the incident's impact
  • Rate limit degradation (slower responses, not hard errors) is invisible without latency tracking
  • You find out about outages from user complaints, not your monitoring system

Proper monitoring gives you a 60-second detection window — enough time to route traffic to a fallback model (Claude, Gemini) and alert your on-call engineer before the incident cascades.

📡
Recommended

Monitor your services before your users notice

Try Better Stack Free →

Where to Check OpenAI API Status

Unlike some providers, OpenAI publishes a dedicated status page. Here are all the places to check:

OpenAI Status Page

★★★★☆

status.openai.com

Official status for the OpenAI API, ChatGPT, and Labs. Built on Statuspage.io — subscribe for email/SMS/webhook notifications. Covers ChatGPT, API, Playground, and the developer dashboard.

15–30 min lag on incident declaration

API Status Check

★★★★★

apistatuscheck.com/api/openai

Real-time third-party monitoring — pings the OpenAI API every 60 seconds independently. Detects incidents before OpenAI declares them. View uptime history and current response times.

60-second detection window

OpenAI Community Forum

★★★☆☆

community.openai.com

Users report issues in real time — often the first place outages are noticed. Search "API down" or "500 error" for current discussions.

Near-real-time community reports

OpenAI Developer Discord

★★★☆☆

discord.com/invite/openai

#api-general channel fills with reports during incidents. Useful for confirmation and workaround discussion.

Real-time

⚠ The Status Page Lag Problem

OpenAI's status page is manually updated by their operations team. During the major GPT-4o degradation incident in March 2026, users reported errors for 22 minutes before the status page was updated. For production applications, rely on automated synthetic monitoring — not manual status page checks.

OpenAI API Rate Limits by Tier (2026)

OpenAI uses a spend-based tier system. You advance tiers automatically after reaching spending thresholds. Rate limits shown below are for GPT-4o — other models have different limits. Check platform.openai.com/account/limits for your current tier and model-specific limits.

TierRequirementRPM (GPT-4o)TPM (GPT-4o)RPD
FreeNo spend required3 RPM40K TPM200 RPD
Tier 1$5 paid500 RPM200K TPM10K RPD
Tier 2$50 paid + 7 days5,000 RPM500K TPMUnlimited
Tier 3$100 paid + 7 days5,000 RPM1M TPMUnlimited
Tier 4$250 paid + 14 days10,000 RPM2M TPMUnlimited
Tier 5$1,000 paid + 30 days10,000 RPM30M TPMUnlimited

Monitoring Rate Limit Proximity

429 errors are preventable with proactive monitoring. Track these metrics:

  • x-ratelimit-remaining-requests and x-ratelimit-remaining-tokens headers in every response
  • x-ratelimit-reset-requests — timestamp when RPM window resets
  • Rolling 1-minute request/token counters in your application
  • Alert when remaining requests drop below 20% of your tier limit

OpenAI API Error Codes Decoded

OpenAI errors return JSON with a error.type field that classifies the error category alongside the HTTP status code. Here's how to handle each:

400 Bad Requestinvalid_request_error

Meaning: Malformed request — invalid parameters, missing required fields, or content policy violation

Action: Check your request body against OpenAI API docs. Common causes: missing model field, context length exceeded, invalid JSON. If content_filter, your input triggered OpenAI's safety system.

401 Unauthorizedauthentication_error

Meaning: Missing or invalid API key

Action: Verify your API key starts with "sk-" and is active in platform.openai.com/api-keys. Check for trailing spaces, rotation issues, or org mismatch if using organization IDs.

403 Forbiddenpermission_error

Meaning: Your API key doesn't have access to this model or feature

Action: Ensure your account has access to the requested model. GPT-4o and o-series models require paid tier access. Some fine-tuned models are org-specific.

404 Not Foundinvalid_request_error

Meaning: Model or endpoint not found

Action: Verify the model name exactly matches the API docs (e.g., "gpt-4o" not "gpt4o", "o3" not "o-3"). Check endpoint path is correct (/v1/chat/completions, /v1/embeddings, etc.).

429 Too Many Requestsrate_limit_error

Meaning: Rate limit exceeded — RPM, TPM, or RPD for your tier

Action: Implement exponential backoff: 1s → 2s → 4s → 8s… up to 60s with jitter. Check Retry-After header. Consider upgrading tier, caching responses, or batching requests. Use token counting to stay under TPM limits.

500 Internal Server Errorapi_error

Meaning: OpenAI server error — not caused by your request

Action: Retry with exponential backoff. If persistent (>5 min), check status.openai.com. These correlate with OpenAI infrastructure incidents. Implement fallback to another model or provider for critical paths.

503 Service Unavailableapi_error

Meaning: OpenAI servers are temporarily overloaded or under maintenance

Action: Retry with backoff. Often occurs during high-demand periods or during maintenance windows. Consider fallback to Claude or Gemini for business-critical requests.

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

What to Monitor: Endpoints & Intervals

Not all OpenAI endpoints are equal. Here's a prioritized monitoring strategy for production applications:

api.openai.com/v1/modelsGET

List available models — fast availability check, no token consumption

Interval: 60sAlert: Non-200 or latency >2s
api.openai.com/v1/chat/completionsPOST

Synthetic chat completion — tests full inference pipeline

Interval: 5 minAlert: Non-200, latency >5s, or error in response
api.openai.com/v1/embeddingsPOST

Embedding generation check — critical for RAG pipelines

Interval: 5 minAlert: Non-200 or latency >3s
status.openai.com/api/v2/status.jsonGET

Official OpenAI status — detect declared incidents

Interval: 60sAlert: Status indicator != "none"

Implementing Retry Logic for OpenAI API

OpenAI recommends exponential backoff with jitter for all retry logic. This prevents thundering herd problems where all your instances retry at the same moment:

import time
import random
import openai

def chat_with_retry(messages, max_retries=5):
    for attempt in range(max_retries):
        try:
            return openai.chat.completions.create(
                model="gpt-4o",
                messages=messages
            )
        except openai.RateLimitError as e:
            if attempt == max_retries - 1:
                raise
            # Exponential backoff with jitter
            wait = (2 ** attempt) + random.uniform(0, 1)
            print(f"Rate limited. Waiting {wait:.1f}s...")
            time.sleep(wait)
        except openai.APIStatusError as e:
            if e.status_code in (500, 503) and attempt < max_retries - 1:
                wait = (2 ** attempt) + random.uniform(0, 1)
                time.sleep(wait)
            else:
                raise

Key retry principles

  • Retry on: 429 (rate limit), 500 (server error), 503 (unavailable)
  • Do NOT retry on: 400 (bad request), 401 (auth), 403 (permission), 404 (not found)
  • Check Retry-After header on 429 — OpenAI sets this to the exact wait time
  • Cap total retry time at 60–120 seconds for user-facing requests
  • Use circuit breakers for non-interactive workloads — stop retrying after 5+ consecutive failures

Setting Up Alerts for OpenAI API Outages

Alert thresholds should match your application's SLA, not arbitrary numbers. Here's a practical tiered alerting strategy:

P1 — Critical
  • OpenAI API returns non-200 for 3 consecutive synthetic checks
  • Error rate >10% over any 5-minute window
  • P95 latency exceeds 15 seconds for 5 consecutive minutes

Response: Page on-call immediately. Consider activating fallback to Claude or Gemini.

P2 — Warning
  • Error rate 2–10% over a 5-minute window
  • P95 latency 5–15 seconds sustained
  • 429 rate limit errors >1% of total requests

Response: Alert team in Slack. Monitor closely. Prepare to activate fallback.

P3 — Info
  • OpenAI's status page changes from 'operational'
  • Rate limit remaining drops below 20% of tier limit
  • Response latency spike >50% above 7-day baseline

Response: Log to monitoring channel. No immediate action needed.

Multi-Provider Fallback Strategy

Production applications serving users should never depend on a single LLM provider. When the OpenAI API goes down, you need an automatic fallback:

Claude (Anthropic)

api.anthropic.com/v1

claude-sonnet-4-5

Best OpenAI fallback — comparable quality, different infrastructure

SLA: 99.9%

Gemini (Google)

generativelanguage.googleapis.com

gemini-2.5-flash

Fast and cheap — good for high-volume fallback scenarios

SLA: 99.8%

Azure OpenAI

your-resource.openai.azure.com

gpt-4o (Azure deployment)

Same models, different infrastructure — best fallback for OpenAI-specific features

SLA: 99.95%

Implement provider fallback at the SDK level using a wrapper that catches 500/503 errors and retries against the fallback provider. Tools like LiteLLM provide unified interfaces across OpenAI, Claude, and Gemini with built-in fallback support.

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time the OpenAI API goes down, you'll know in under 60 seconds — not when your users start complaining.

  • Email alerts for the OpenAI API + 9 more APIs
  • $0 due today for trial
  • Cancel anytime — $9/mo after trial

Related Guides

Frequently Asked Questions

How do I check if the OpenAI API is down?

Check status.openai.com for the official OpenAI status page, which covers the API, ChatGPT, and Labs. For real-time monitoring with instant alerts, use API Status Check or Better Stack to synthetically ping api.openai.com/v1/models on a 60-second interval — OpenAI's status page can lag behind actual incidents by 15–30 minutes.

What are the OpenAI API rate limits?

OpenAI rate limits vary by usage tier. Free tier: 3 RPM, 200 RPD, 40K TPM. Tier 1 ($5 spent): 500 RPM, 10K RPD, 200K TPM for GPT-4o. Tier 2 ($50 spent): 5,000 RPM, 500K TPM. Tier 3 ($100 spent): 5,000 RPM, 1M TPM. Tier 4 ($250 spent): 10,000 RPM, 2M TPM. Tier 5 ($1,000 spent): 10,000 RPM, 30M TPM. Limits vary by model — check platform.openai.com/account/limits for your current tier.

What does an OpenAI API 429 error mean?

A 429 error means you have exceeded your rate limit — either requests per minute (RPM), requests per day (RPD), or tokens per minute (TPM). The response body includes a "type": "rate_limit_error" message specifying which limit was hit. Implement exponential backoff starting at 1 second, doubling up to 60 seconds with random jitter. Check the Retry-After header when present.

What does an OpenAI API 500 error mean?

A 500 (Internal Server Error) from the OpenAI API indicates a server-side issue at OpenAI — not a problem with your request. These typically resolve within minutes. Retry with exponential backoff. If 500 errors persist beyond 5 minutes, check status.openai.com for an active incident. 500 errors during OpenAI incidents can last 30–90 minutes.

What is OpenAI's status page URL?

OpenAI's official status page is at status.openai.com. It covers the OpenAI API, ChatGPT, OpenAI Labs, and the developer dashboard. OpenAI uses Statuspage.io for this — you can subscribe to email/SMS/webhook notifications for incidents. Note: OpenAI often updates the status page 15–30 minutes after an incident starts, so proactive monitoring is recommended.