Why GraphQL Monitoring is Different from REST
REST APIs use HTTP status codes to communicate success and failure: 200 OK, 404 Not Found, 500 Server Error. Monitoring a REST API is straightforward โ check the status code and response time.
GraphQL breaks this model. The entire GraphQL protocol runs over a single endpoint (typically /graphql) and always returns HTTP 200 โ even when the query fails. Errors are reported in the response body:
// HTTP 200 โ but the query FAILED
{
"data": null,
"errors": [
{
"message": "Cannot query field \"user\" on type \"Query\"",
"locations": [{ "line": 2, "column": 3 }]
}
]
}A monitoring tool that only checks for HTTP 200 would report this as healthy. That's why GraphQL requires a fundamentally different monitoring approach.
Layer 1: GraphQL Uptime Monitoring
The most basic layer of GraphQL monitoring answers: is the GraphQL server accepting requests?
Method 1: Introspection Query Health Check
The GraphQL introspection query is built into every GraphQL server. You can use it as a health check endpoint:
curl -s -X POST https://api.example.com/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ __typename }"}' | jq '.data.__typename'
# Returns "Query" if healthy
# Returns null or errors if the server is brokenThe { __typename } query is the lightest possible GraphQL query โ it doesn't touch your resolvers or database. If it returns "Query", the GraphQL layer is alive.
Method 2: Dedicated Health Query
For more meaningful health checks, add a dedicated health query to your schema:
# Schema definition
type Query {
health: HealthStatus!
}
type HealthStatus {
status: String! # "ok" or "degraded"
version: String!
db: Boolean! # Database connectivity
cache: Boolean! # Cache connectivity
}
# Monitoring query
query HealthCheck {
health {
status
db
cache
}
}This lets your monitoring tool verify not just the GraphQL layer, but also downstream dependencies like databases and caches โ all in a single lightweight query.
๐ก Monitor your GraphQL API uptime every 30 seconds โ get alerted in under a minute
Trusted by 100,000+ websites ยท Free tier available
Layer 2: Error Rate Monitoring
Even when your GraphQL server is responding with HTTP 200, it may be returning high rates of application errors. Track these metrics:
Key Error Rate Metrics
| Metric | Alert Threshold | Indicates |
|---|---|---|
| Response has errors field | > 1% of requests | Schema/resolver issues |
| data field is null | > 0.1% of requests | Critical resolver failures |
| Partial data (null fields) | Baseline + 5% | Individual field resolver errors |
| HTTP 4xx/5xx responses | > 0.5% of requests | Auth, rate limit, or infra failures |
Parsing GraphQL Errors in Your Monitoring
When setting up synthetic monitoring, configure your monitoring tool to check both HTTP status AND response body:
# Better Stack monitor assertion example
# Check 1: HTTP status is 200
# Check 2: Response body contains "data"
# Check 3: Response body does NOT contain '"errors"'
# Check 4: Response time < 2000ms
# Shell script approach
RESPONSE=$(curl -s -X POST https://api.example.com/graphql \
-H "Content-Type: application/json" \
-d '{"query": "{ health { status } }"}')
if echo "$RESPONSE" | jq -e '.errors' > /dev/null 2>&1; then
echo "ALERT: GraphQL returned errors"
exit 1
fi
STATUS=$(echo "$RESPONSE" | jq -r '.data.health.status')
if [ "$STATUS" != "ok" ]; then
echo "ALERT: Health status is $STATUS"
exit 1
fiMonitor your GraphQL API with Better Stack
Better Stack supports custom response body assertions โ set up monitors that check for GraphQL errors in the response body, not just HTTP status codes.
Try Better Stack Free โLayer 3: Query Performance Monitoring
GraphQL performance monitoring is more complex than REST because a single endpoint serves many different query patterns with wildly different complexity.
Key Performance Metrics to Track
- P50/P95/P99 response times by operation name: Aggregate by the GraphQL operation name (e.g.,
GetUserProfile), not just the endpoint. - Resolver execution time per field: Identify slow resolvers that drag down query performance. N+1 query problems surface here.
- Query complexity score: Track how complex incoming queries are. Alert on queries that approach your complexity limit.
- DataLoader batch efficiency: If you use DataLoader for batching, monitor batch sizes and cache hit rates.
The N+1 Query Problem
The most common GraphQL performance anti-pattern is the N+1 query problem: fetching a list of N items and then making N individual database queries for each item's related data. In monitoring terms, this shows up as:
- Response time scaling linearly with result set size
- Database query count per request that scales with N
- High database CPU during seemingly normal list queries
Track database query count per GraphQL request. Alert if a single request generates more than 20-50 database queries โ that's usually a DataLoader misconfiguration or missing batch resolver.
Layer 4: Schema Change Monitoring
GraphQL schema changes can silently break clients. Monitoring schema health means:
- Breaking change detection: Use tools like GraphQL Inspector or Apollo Studio to detect when schema changes would break existing queries.
- Deprecated field usage: Track usage of deprecated fields before removing them. Alert if deprecated fields receive significant traffic.
- Schema registry: Maintain a schema registry (Apollo Studio, Hasura, or self-hosted) that versions your schema and tracks changes over time.
Best Tools for GraphQL Monitoring in 2026
For Uptime Monitoring
- Better Stack: Supports POST requests with custom bodies and response assertion. Use it for synthetic GraphQL health checks with response body validation.
- API Status Check (Alert Pro): Monitors GraphQL endpoints with custom query assertions and instant alerts.
- Checkly: Powerful for complex GraphQL monitoring scenarios with JavaScript assertions.
For Performance and Observability
- Apollo Studio: The gold standard for Apollo GraphQL monitoring. Provides per-operation metrics, field-level latency, error tracking, and schema change management.
- DataDog with GraphQL plugin: APM tracing for GraphQL resolvers. Integrates with all major GraphQL frameworks.
- New Relic: GraphQL-aware APM that breaks down performance by operation name and resolver.
- OpenTelemetry: Framework-agnostic tracing that works with all GraphQL servers. Use
@opentelemetry/instrumentation-graphqlfor automatic instrumentation.
For Schema Monitoring
- GraphQL Inspector: Open source. Detects breaking changes, deprecated field usage, and schema similarities.
- Apollo Studio Schema Registry: Managed schema versioning with breaking change detection built in.
- Hive (GraphQL Hive): Open source alternative to Apollo Studio with schema registry, analytics, and integrations.
Start monitoring your GraphQL API today
Set up uptime monitors for your GraphQL endpoint in under 2 minutes. Better Stack supports custom POST bodies and response body assertions for proper GraphQL health checks.
Try Better Stack Free โGraphQL Monitoring Checklist
- โSynthetic uptime monitor on /graphql with body assertion (checks for "data" field, not just HTTP 200)
- โDedicated health query in your schema touching real dependencies (DB, cache)
- โError rate alert: > 1% of requests returning errors field
- โResponse time P95 alert: threshold varies by query complexity
- โPer-operation latency tracking in APM (DataDog, New Relic, or Apollo Studio)
- โN+1 query detection: database query count per request
- โSchema change CI check: no breaking changes without explicit version bump
- โDeprecated field usage monitoring before removals
- โRate limiting alerts: track 429 responses and approaching rate limits
- โSubscription connection monitoring if using GraphQL subscriptions
Frequently Asked Questions
Should I disable GraphQL introspection in production?
Yes โ for security, disable introspection in production (it exposes your full schema to anyone). For monitoring, add a separate health query to your schema that is authentication-exempt and lightweight. This is more secure and more useful than relying on introspection for health checks.
How do I monitor GraphQL subscriptions?
GraphQL subscriptions are stateful WebSocket connections, not simple HTTP requests. Monitor them by: 1) Tracking WebSocket connection success rate, 2) Monitoring subscription message delivery latency, 3) Checking for WebSocket disconnections and reconnection rates. Standard HTTP uptime monitors don't apply to subscriptions.
What's the best way to set query complexity limits?
Use a library like graphql-query-complexity to assign cost values to fields based on their resolver complexity. Set a maximum complexity per query (typically 100-500 depending on your schema). This prevents expensive queries from overloading your resolvers while allowing normal usage.
How do I monitor a federated GraphQL architecture (Apollo Federation)?
Monitor both the gateway and each subgraph independently. The gateway has its own health check. Monitor each subgraph's introspection endpoint separately. Also monitor the schema composition โ if subgraph schemas become incompatible, the federated schema rebuild will fail silently.
Alert Pro
14-day free trialStop checking โ get alerted instantly
Next time GraphQL API goes down, you'll know in under 60 seconds โ not when your users start complaining.
- Email alerts for GraphQL API + 9 more APIs
- $0 due today for trial
- Cancel anytime โ $9/mo after trial