Where can I monitor API status in real-time?

API Status Check (apistatuscheck.com) provides real-time monitoring for 100+ APIs with uptime tracking and alerts. You can view dashboards, subscribe to feeds, and set up notifications in minutes.

API Performance Optimization: Complete Guide for Production Systems

Q: API Performance Optimization: Complete Guide for Production Systems?

This post explains API Performance Optimization: Complete Guide for Production Systems with clear steps and practical examples. Use the guidance to apply the recommendations in your own API workflows.

Slow APIs kill user experience, waste infrastructure costs, and limit scalability. A 100ms delay can reduce conversions by 7%. A 1-second delay can cost Amazon $1.6 billion in sales annually.

This guide covers production-ready performance optimization strategies used by Stripe, Shopify, and AWS to serve millions of requests per day with sub-100ms response times.

Why API Performance Matters

Impact on Business Metrics

User Experience & Conversions

100ms delay → 7% drop in conversions (Amazon study)
1-second delay → 11% fewer page views, 7% loss in conversions
3-second load time → 53% of mobile users abandon

Infrastructure Costs

Slow APIs = more concurrent requests = higher server costs
Example: Reducing response time from 500ms to 100ms → 5x fewer concurrent connections
$10,000/month servers → $2,000/month with proper optimization

Scalability

Faster responses = more requests per server
Example: 100ms responses = 10 req/sec per connection vs 1 req/sec at 1000ms
10x throughput with same infrastructure

Real-World Impact

Shopify Black Friday 2023

1.8M requests/second peak
Sub-100ms P95 response times
Result: Zero downtime, $9.3B in sales

Stripe API

99.99% uptime
P99 latency <200ms globally
Handles $640B annually

Netflix API

1 billion API calls/day
P99 latency <100ms
Powers 230M subscribers globally

Key Performance Metrics

Essential Metrics to Track

Response Time (Latency)

P50 (median): 50% of requests faster than this
P95: 95% of requests faster than this
P99: 99% of requests faster than this (catches outliers)

Why P99 matters more than average:

Average: 100ms (looks good!)
P99: 5000ms (5% of users waiting 5 seconds = terrible UX)

Throughput

Requests per second (RPS) your API can handle
Example targets:
- Small API: 100-1,000 RPS
- Medium: 1,000-10,000 RPS
- Large: 10,000+ RPS

Error Rate

4xx errors: Client mistakes (not your fault)
5xx errors: Server failures (your fault)
Target: <0.1% error rate under normal load

Saturation

CPU usage (target: <70% average, <90% peak)
Memory usage (target: <80%)
Database connections (target: <80% of pool)

Database Optimization

Database queries are the #1 performance bottleneck in most APIs.

Indexing Strategy

Before Indexing (Sequential Scan)

-- Query: Find user by email
SELECT * FROM users WHERE email = 'user@example.com';
-- Execution: 2,500ms (scans 1 million rows)

After Indexing

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Same query now: 8ms (index lookup)
-- 312x faster!

Composite Indexes for Multi-Column Queries

-- Query: Active orders for user in date range
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'active' 
  AND created_at > '2026-01-01';

-- Index order matters!
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at DESC);

-- user_id first (most selective filter)
-- status second (additional filter)
-- created_at last (for sorting)

Check Index Usage

-- PostgreSQL: Explain query plan
EXPLAIN ANALYZE 
SELECT * FROM orders WHERE user_id = 123;

-- Look for:
-- ✅ "Index Scan" or "Index Only Scan"
-- ❌ "Seq Scan" (sequential scan = missing index)

Query Optimization

N+1 Query Problem (Most Common Mistake)

// ❌ BAD: N+1 queries (1 + 100 = 101 database roundtrips)
async function getOrdersWithUsers() {
  const orders = await db.order.findMany(); // 1 query
  
  for (const order of orders) {
    // 100 separate queries!
    order.user = await db.user.findUnique({
      where: { id: order.userId }
    });
  }
  
  return orders;
}
// Response time: 2,500ms

// ✅ GOOD: Single query with join
async function getOrdersWithUsers() {
  const orders = await db.order.findMany({
    include: {
      user: true // Prisma automatically joins
    }
  });
  
  return orders;
}
// Response time: 45ms (55x faster!)

Select Only What You Need

// ❌ BAD: Fetching unnecessary data
const users = await db.user.findMany(); 
// Returns: id, email, password_hash, created_at, updated_at, profile, settings...
// Payload: 50KB per user × 100 users = 5MB

// ✅ GOOD: Select specific fields
const users = await db.user.findMany({
  select: {
    id: true,
    email: true,
    name: true
  }
});
// Payload: 2KB per user × 100 users = 200KB (25x smaller!)

Pagination for Large Datasets

// ❌ BAD: Loading all records
const allOrders = await db.order.findMany(); // 1 million rows = 500MB = 30 seconds
// OOM crash on large datasets

// ✅ GOOD: Cursor-based pagination
const orders = await db.order.findMany({
  take: 100,
  cursor: lastOrderId ? { id: lastOrderId } : undefined,
  orderBy: { created_at: 'desc' }
});
// Returns 100 rows in 12ms

Connection Pooling

Problem: Creating new database connections is slow

New connection: 50-200ms
Pooled connection: 0.5ms
400x faster with connection pooling

import { PrismaClient } from '@prisma/client';

// ✅ Connection pool configuration
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL
    }
  },
  // Connection pool settings
  pool: {
    min: 2,          // Minimum connections always open
    max: 10,         // Maximum concurrent connections
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 30000
  }
});

// Connection lifecycle:
// 1. Request arrives → get connection from pool (0.5ms)
// 2. Execute query
// 3. Return connection to pool (reused by next request)

Pool Sizing Formula

Optimal pool size = (core_count × 2) + effective_spindle_count

Example for typical web server:
- 4 CPU cores
- SSD storage (spindle = 1)
- Pool size = (4 × 2) + 1 = 9 connections

Database Monitoring

Slow Query Logging

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient({
  log: [
    {
      emit: 'event',
      level: 'query'
    }
  ]
});

prisma.$on('query', (e) => {
  if (e.duration > 1000) { // Queries slower than 1 second
    console.warn('Slow query detected:', {
      query: e.query,
      duration: `${e.duration}ms`,
      timestamp: e.timestamp
    });
  }
});

Key Metrics to Track

Query execution time (P50, P95, P99)
Connection pool utilization
Slow query count
Lock wait time
Cache hit ratio

Caching Strategies

Caching = storing computed results to avoid redundant work.

Cache Hierarchy (Fastest to Slowest)

In-Memory Cache (0.1ms) - Node.js Map/LRU
Redis Cache (1-2ms) - Shared across servers
CDN Cache (10-50ms) - Global edge locations
Database (20-200ms) - No cache
External API (100-2000ms) - Third-party service

In-Memory Caching

import NodeCache from 'node-cache';

// Create cache with 5-minute TTL
const cache = new NodeCache({ stdTTL: 300 });

async function getUser(userId: string) {
  // Check cache first
  const cached = cache.get<User>(`user:${userId}`);
  if (cached) {
    console.log('Cache HIT');
    return cached; // 0.1ms response time
  }
  
  console.log('Cache MISS');
  // Fetch from database
  const user = await db.user.findUnique({ where: { id: userId } });
  
  // Store in cache
  cache.set(`user:${userId}`, user);
  
  return user; // 45ms first time, 0.1ms after
}

Use Cases for In-Memory Cache

User sessions
Configuration data
Frequently accessed lookup tables
API responses that rarely change

Limitations

Not shared across servers (each server has own cache)
Lost on server restart
Limited by RAM (max ~1-2GB typically)

Redis Caching (Production Standard)

import Redis from 'ioredis';

const redis = new Redis({
  host: 'localhost',
  port: 6379,
  maxRetriesPerRequest: 3
});

async function getProductWithCache(productId: string) {
  const cacheKey = `product:${productId}`;
  
  // Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Fetch from database
  const product = await db.product.findUnique({
    where: { id: productId },
    include: { reviews: true }
  });
  
  // Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(product));
  
  return product;
}

Cache Invalidation Strategies

Time-Based (TTL)

// Cache for 5 minutes
await redis.setex('key', 300, value);
// Pros: Simple, prevents stale data
// Cons: May serve outdated data for up to 5 minutes

Event-Based Invalidation

// Invalidate when data changes
async function updateProduct(id: string, data: ProductUpdate) {
  // Update database
  const product = await db.product.update({
    where: { id },
    data
  });
  
  // Invalidate cache immediately
  await redis.del(`product:${id}`);
  
  return product;
}
// Pros: Always fresh data
// Cons: Requires invalidation logic everywhere

Cache-Aside Pattern (Lazy Loading)

// 1. Check cache
// 2. If miss, fetch from DB
// 3. Store in cache
// 4. Return data

// Most common pattern for read-heavy workloads

HTTP Caching Headers

import express from 'express';

app.get('/api/products/:id', async (req, res) => {
  const product = await getProduct(req.params.id);
  
  // ✅ Cache in browser for 5 minutes
  res.set('Cache-Control', 'public, max-age=300');
  
  // ✅ ETag for conditional requests
  const etag = generateETag(product);
  res.set('ETag', etag);
  
  // If client's ETag matches, return 304 Not Modified
  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end(); // No data transfer!
  }
  
  res.json(product);
});

Cache-Control Directives

public          - Can be cached by browsers + CDNs
private         - Only cached by browser (sensitive data)
no-cache        - Must revalidate with server before using
no-store        - Never cache (credit card data, etc.)
max-age=300     - Cache for 300 seconds (5 minutes)
s-maxage=3600   - CDN can cache for 1 hour

CDN Caching for Global Performance

// Example: Cloudflare caching
app.get('/api/public/products', async (req, res) => {
  const products = await getProducts();
  
  // ✅ Cache at CDN edge for 1 hour
  res.set('Cache-Control', 'public, s-maxage=3600');
  
  // ✅ Cloudflare-specific header
  res.set('CDN-Cache-Control', 'max-age=3600');
  
  res.json(products);
});

Performance Impact of CDN

Without CDN: 200ms (US East → Singapore)
With CDN: 15ms (cached at Singapore edge)
13x faster for global users

Best CDN Providers for APIs

Cloudflare - Free tier, 250+ locations
Fastly - Real-time purge, low latency
AWS CloudFront - Tight AWS integration
Akamai - Enterprise-grade, massive scale

Code-Level Optimization

Async/Await vs Synchronous Operations

// ❌ BAD: Sequential operations (waterfall)
async function getUserData(userId: string) {
  const user = await db.user.findUnique({ where: { id: userId } });     // 50ms
  const orders = await db.order.findMany({ where: { userId } });         // 80ms
  const reviews = await db.review.findMany({ where: { userId } });       // 60ms
  
  return { user, orders, reviews };
}
// Total time: 50 + 80 + 60 = 190ms

// ✅ GOOD: Parallel operations
async function getUserData(userId: string) {
  const [user, orders, reviews] = await Promise.all([
    db.user.findUnique({ where: { id: userId } }),
    db.order.findMany({ where: { userId } }),
    db.review.findMany({ where: { userId } })
  ]);
  
  return { user, orders, reviews };
}
// Total time: max(50, 80, 60) = 80ms (2.4x faster!)

Request Batching (DataLoader Pattern)

import DataLoader from 'dataloader';

// ❌ PROBLEM: N+1 queries in GraphQL/nested requests
// Fetching 100 orders → 100 separate user queries

// ✅ SOLUTION: Batch requests into single query
const userLoader = new DataLoader(async (userIds: string[]) => {
  // Batch: Load all users in single query
  const users = await db.user.findMany({
    where: { id: { in: userIds } }
  });
  
  // Return in same order as requested IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

// Usage
async function getOrders() {
  const orders = await db.order.findMany();
  
  // Automatically batches into single query!
  for (const order of orders) {
    order.user = await userLoader.load(order.userId);
  }
  
  return orders;
}

// Result:
// - 100 orders fetched: 1 query
// - 100 users fetched: 1 batched query (not 100!)
// - Total: 2 queries instead of 101 (50x fewer DB calls)

JSON Payload Optimization

// ❌ BAD: Returning entire objects with unnecessary fields
app.get('/api/users', async (req, res) => {
  const users = await db.user.findMany({
    include: {
      profile: true,
      settings: true,
      orders: {
        include: {
          items: true,
          shipping: true
        }
      }
    }
  });
  
  res.json(users);
});
// Payload: 500KB per user × 100 users = 50MB response!

// ✅ GOOD: Return only what's needed
app.get('/api/users', async (req, res) {
  const users = await db.user.findMany({
    select: {
      id: true,
      name: true,
      email: true,
      avatar: true
    }
  });
  
  res.json(users);
});
// Payload: 500 bytes per user × 100 users = 50KB (1000x smaller!)

Avoid Synchronous Operations in Request Path

// ❌ BAD: Blocking operations during request
app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // ❌ Blocks response while sending email (500ms)
  await sendOrderConfirmationEmail(order);
  
  // ❌ Blocks response while updating analytics (200ms)
  await updateAnalytics(order);
  
  res.json(order);
});
// Response time: 700ms + database time

// ✅ GOOD: Queue non-critical work
import Queue from 'bull';
const emailQueue = new Queue('emails', redisConfig);

app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // ✅ Queue email (non-blocking, 1ms)
  emailQueue.add({ orderId: order.id });
  
  // ✅ Fire and forget analytics
  updateAnalytics(order).catch(console.error);
  
  res.json(order); // Returns immediately!
});
// Response time: database time only (95% faster)

Network Optimization

Response Compression

import compression from 'compression';
import express from 'express';

const app = express();

// ✅ Enable gzip compression
app.use(compression({
  level: 6,              // Compression level (1-9, 6 is balanced)
  threshold: 1024,       // Only compress responses > 1KB
  filter: (req, res) => {
    // Don't compress images/videos (already compressed)
    if (req.headers['x-no-compression']) {
      return false;
    }
    return compression.filter(req, res);
  }
}));

// Impact:
// - JSON response: 100KB → 15KB (85% smaller)
// - Transfer time: 200ms → 30ms over 3G
// - 6.6x faster download

HTTP/2 Server Push (Node.js)

import { createSecureServer } from 'http2';
import { readFileSync } from 'fs';

const server = createSecureServer({
  key: readFileSync('server.key'),
  cert: readFileSync('server.cert')
});

server.on('stream', (stream, headers) => {
  if (headers[':path'] === '/api/dashboard') {
    // ✅ Push critical resources before client requests them
    stream.pushStream({ ':path': '/api/user' }, (err, pushStream) => {
      pushStream.respond({ ':status': 200 });
      pushStream.end(JSON.stringify(userData));
    });
    
    stream.respond({ ':status': 200 });
    stream.end(JSON.stringify(dashboardData));
  }
});

// Impact:
// - HTTP/1.1: Request dashboard (100ms) → Request user (100ms) = 200ms
// - HTTP/2: Request dashboard + pushed user data = 100ms
// - 2x faster

Minimize Payload Size

Use Field Selection (GraphQL-style)

// Allow clients to specify which fields they want
app.get('/api/users', async (req, res) => {
  const fields = req.query.fields?.split(',') || ['id', 'name', 'email'];
  
  const select = fields.reduce((acc, field) => {
    acc[field] = true;
    return acc;
  }, {} as any);
  
  const users = await db.user.findMany({ select });
  res.json(users);
});

// Usage: /api/users?fields=id,name,email
// Returns only requested fields (smaller payload)

Remove Null Values

function removeNulls(obj: any): any {
  if (Array.isArray(obj)) {
    return obj.map(removeNulls).filter(v => v != null);
  }
  if (obj !== null && typeof obj === 'object') {
    return Object.entries(obj)
      .filter(([_, v]) => v != null)
      .reduce((acc, [k, v]) => ({ ...acc, [k]: removeNulls(v) }), {});
  }
  return obj;
}

// Before: { id: 1, name: "Alice", bio: null, avatar: null } = 50 bytes
// After:  { id: 1, name: "Alice" } = 24 bytes (52% smaller)

Infrastructure Optimization

Load Balancing

// Simple round-robin load balancer with health checks
import express from 'express';
import axios from 'axios';

const servers = [
  'http://server1:3000',
  'http://server2:3000',
  'http://server3:3000'
];

let currentIndex = 0;
const healthStatus = new Map<string, boolean>();

// Health check every 30 seconds
setInterval(async () => {
  for (const server of servers) {
    try {
      await axios.get(`${server}/health`, { timeout: 1000 });
      healthStatus.set(server, true);
    } catch {
      healthStatus.set(server, false);
      console.warn(`Server ${server} is DOWN`);
    }
  }
}, 30000);

// Proxy requests to healthy servers
app.use(async (req, res) => {
  const healthyServers = servers.filter(s => healthStatus.get(s) !== false);
  
  if (healthyServers.length === 0) {
    return res.status(503).json({ error: 'No healthy servers' });
  }
  
  // Round-robin selection
  const targetServer = healthyServers[currentIndex % healthyServers.length];
  currentIndex++;
  
  try {
    const response = await axios({
      method: req.method,
      url: `${targetServer}${req.path}`,
      data: req.body,
      headers: req.headers,
      timeout: 5000
    });
    
    res.status(response.status).json(response.data);
  } catch (error) {
    res.status(500).json({ error: 'Server error' });
  }
});

Production Load Balancers

NGINX - Industry standard, 100K+ req/sec
HAProxy - Layer 4/7 balancing, health checks
AWS ALB - Managed, auto-scaling
Cloudflare Load Balancing - Global, DDoS protection

Auto-Scaling

# Example: Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2        # Always at least 2 pods
  maxReplicas: 10       # Scale up to 10 during traffic spikes
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when memory > 80%

Auto-Scaling Decisions

Scale UP: CPU > 70% for 2 minutes
Scale DOWN: CPU < 30% for 5 minutes
Result: Right-sized infrastructure = optimal cost

Database Read Replicas

import { PrismaClient } from '@prisma/client';

// Primary database (writes)
const prismaWrite = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_PRIMARY_URL }
  }
});

// Read replica (reads only)
const prismaRead = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_REPLICA_URL }
  }
});

// Write operations → primary
async function createOrder(data: OrderInput) {
  return prismaWrite.order.create({ data });
}

// Read operations → replica (reduces primary load)
async function getOrders(userId: string) {
  return prismaRead.order.findMany({
    where: { userId }
  });
}

// Impact:
// - Primary handles 100% writes + 0% reads
// - Replica handles 100% reads
// - 3 replicas = 75% load reduction on primary

Monitoring & Profiling

Application Performance Monitoring (APM)

import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  integrations: [
    new ProfilingIntegration()
  ],
  tracesSampleRate: 0.1, // Sample 10% of requests
  profilesSampleRate: 0.1
});

// Automatic performance tracking
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

// Custom performance tracking
app.get('/api/expensive-operation', async (req, res) => {
  const transaction = Sentry.startTransaction({
    name: 'Expensive Operation',
    op: 'http.request'
  });
  
  const span1 = transaction.startChild({ op: 'database.query', description: 'Fetch users' });
  const users = await db.user.findMany();
  span1.finish();
  
  const span2 = transaction.startChild({ op: 'processing', description: 'Transform data' });
  const transformed = processUsers(users);
  span2.finish();
  
  transaction.finish();
  
  res.json(transformed);
});

// View in Sentry dashboard:
// - Total request time: 450ms
// - Database query: 250ms (55% of time)
// - Processing: 200ms (44% of time)
// → Optimize database query first!

Best APM Tools

Datadog APM - Full-stack observability
New Relic - Real user monitoring
Sentry - Error + performance tracking
AWS X-Ray - Distributed tracing for AWS

Performance Benchmarking

import autocannon from 'autocannon';

// Load test your API
async function benchmarkAPI() {
  const result = await autocannon({
    url: 'http://localhost:3000/api/users',
    connections: 100,    // 100 concurrent connections
    duration: 30,        // 30 seconds
    pipelining: 1
  });
  
  console.log('Performance Results:');
  console.log(`Requests/sec: ${result.requests.average}`);
  console.log(`Latency P50: ${result.latency.p50}ms`);
  console.log(`Latency P95: ${result.latency.p95}ms`);
  console.log(`Latency P99: ${result.latency.p99}ms`);
  console.log(`Error rate: ${(result.non2xx / result.requests.total) * 100}%`);
}

// Example output:
// Requests/sec: 5,240
// Latency P50: 15ms
// Latency P95: 48ms
// Latency P99: 120ms
// Error rate: 0.02%

Load Testing Tools

autocannon (Node.js) - 40K+ req/sec benchmarking
k6 - Modern load testing, Grafana integration
Apache JMeter - Enterprise-grade, GUI
wrk - Minimal, blazing fast

Real-Time Performance Monitoring

import prometheus from 'prom-client';

// Create metrics
const httpRequestDuration = new prometheus.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] // Response time buckets
});

const httpRequestsTotal = new prometheus.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

// Middleware to track metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    
    httpRequestDuration.observe({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    }, duration);
    
    httpRequestsTotal.inc({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    });
  });
  
  next();
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', prometheus.register.contentType);
  res.end(await prometheus.register.metrics());
});

// Visualize in Grafana:
// - Request rate over time
// - P50/P95/P99 latency graphs
// - Error rate trends
// - Slow endpoint identification

Real-World Examples

Example 1: E-Commerce Product API

Before Optimization

app.get('/api/products/:id', async (req, res) => {
  const product = await db.product.findUnique({
    where: { id: req.params.id },
    include: {
      reviews: true,      // 1,000+ reviews
      variants: true,     // 50 variants
      images: true,       // 10 images
      relatedProducts: {
        include: {
          reviews: true,  // Another 1,000+ reviews per product
          images: true
        }
      }
    }
  });
  
  res.json(product);
});
// Response time: 3,200ms
// Payload: 850KB

After Optimization

import NodeCache from 'node-cache';
const cache = new NodeCache({ stdTTL: 300 });

app.get('/api/products/:id', async (req, res) => {
  const productId = req.params.id;
  
  // 1. Check cache
  const cached = cache.get(productId);
  if (cached) {
    res.set('X-Cache', 'HIT');
    return res.json(cached);
  }
  
  // 2. Optimized query
  const product = await db.product.findUnique({
    where: { id: productId },
    select: {
      id: true,
      name: true,
      price: true,
      description: true,
      images: {
        take: 3,  // Only first 3 images
        select: { url: true, alt: true }
      },
      reviews: {
        take: 5,  // Only latest 5 reviews
        orderBy: { created_at: 'desc' },
        select: { rating: true, comment: true, author: true }
      }
    }
  });
  
  // 3. Aggregate stats instead of loading all data
  const [avgRating, reviewCount] = await db.$queryRaw`
    SELECT 
      AVG(rating) as avg_rating,
      COUNT(*) as review_count
    FROM reviews 
    WHERE product_id = ${productId}
  `;
  
  const result = {
    ...product,
    avgRating,
    reviewCount
  };
  
  // 4. Cache result
  cache.set(productId, result);
  
  // 5. HTTP caching
  res.set('Cache-Control', 'public, max-age=300');
  res.set('X-Cache', 'MISS');
  
  res.json(result);
});

// Response time: 45ms (71x faster!)
// Payload: 12KB (70x smaller!)

Example 2: Analytics Dashboard API

Before Optimization

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  
  // Sequential queries (waterfall)
  const pageViews = await db.pageView.count({ where: { userId } });
  const uniqueVisitors = await db.visitor.count({ where: { userId } });
  const revenue = await db.order.aggregate({
    where: { userId },
    _sum: { total: true }
  });
  const topPages = await db.pageView.groupBy({
    by: ['page'],
    where: { userId },
    _count: true,
    orderBy: { _count: { page: 'desc' } },
    take: 10
  });
  
  res.json({ pageViews, uniqueVisitors, revenue, topPages });
});
// Response time: 2,800ms

After Optimization

import Redis from 'ioredis';
const redis = new Redis();

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  const cacheKey = `analytics:${userId}:${new Date().toISOString().split('T')[0]}`;
  
  // 1. Check Redis cache (daily cache)
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }
  
  // 2. Parallel queries
  const [pageViews, uniqueVisitors, revenue, topPages] = await Promise.all([
    db.pageView.count({ where: { userId } }),
    db.visitor.count({ where: { userId } }),
    db.order.aggregate({
      where: { userId },
      _sum: { total: true }
    }),
    db.pageView.groupBy({
      by: ['page'],
      where: { userId },
      _count: true,
      orderBy: { _count: { page: 'desc' } },
      take: 10
    })
  ]);
  
  const result = { pageViews, uniqueVisitors, revenue, topPages };
  
  // 3. Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(result));
  
  res.json(result);
});

// Response time: 180ms first load, 2ms cached (1555x faster!)

Example 3: Search API

Before Optimization

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Full-text search across all fields
  const results = await db.product.findMany({
    where: {
      OR: [
        { name: { contains: query } },
        { description: { contains: query } },
        { category: { contains: query } },
        { tags: { has: query } }
      ]
    }
  });
  
  res.json(results);
});
// Response time: 4,500ms for 100,000+ products
// No relevance ranking

After Optimization with Elasticsearch

import { Client } from '@elastic/elasticsearch';
const elastic = new Client({ node: 'http://localhost:9200' });

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Elasticsearch full-text search with relevance ranking
  const { hits } = await elastic.search({
    index: 'products',
    body: {
      query: {
        multi_match: {
          query,
          fields: ['name^3', 'description', 'category^2', 'tags'],
          fuzziness: 'AUTO' // Handle typos
        }
      },
      size: 20,
      from: (req.query.page || 0) * 20,
      highlight: {
        fields: {
          name: {},
          description: {}
        }
      }
    }
  });
  
  const results = hits.hits.map(hit => ({
    ...hit._source,
    score: hit._score,
    highlights: hit.highlight
  }));
  
  res.json(results);
});

// Response time: 25ms (180x faster!)
// Relevance ranking + typo tolerance + highlighting

Common Mistakes to Avoid

1. Not Using Database Indexes

Symptom: Queries that take seconds on tables with 100K+ rows

Fix: Add indexes on columns used in WHERE, ORDER BY, JOIN

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

2. Fetching All Data When You Need Aggregates

Mistake: Loading 1M records to count them

const users = await db.user.findMany(); // Loads 1M records = 500MB
const count = users.length; // 😱

Fix: Use database aggregation

const count = await db.user.count(); // Returns number only

3. Not Implementing Pagination

Mistake: Returning unlimited results

const products = await db.product.findMany(); // Returns 100,000 products = 50MB

Fix: Always paginate

const products = await db.product.findMany({
  take: 50,
  skip: (page - 1) * 50
});

4. Caching Everything Forever

Mistake: No cache invalidation strategy

cache.set('key', value); // Cached forever, even when data changes

Fix: Use appropriate TTLs

cache.set('key', value, 300); // 5 minutes for frequently changing data
cache.set('config', configValue, 3600); // 1 hour for rarely changing data

5. Synchronous Operations in Request Path

Mistake: Blocking response for non-critical tasks

await sendEmail(); // Blocks response for 500ms
await updateAnalytics(); // Blocks another 200ms
res.json(result); // User waits 700ms unnecessarily

Fix: Queue background jobs

emailQueue.add({ orderId }); // Returns in 1ms
res.json(result); // User gets response immediately

6. Not Monitoring Performance

Mistake: No visibility into slow endpoints

Fix: Add APM + custom metrics

// Track every endpoint's performance
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.recordLatency(req.route?.path, duration);
  });
  next();
});

7. Over-Optimization Too Early

Mistake: Complex caching before measuring actual bottlenecks

Fix:

Measure first (add metrics)
Identify bottleneck (slowest 20% of endpoints)
Optimize only the bottleneck
Measure again
Repeat

8. Not Using Connection Pooling

Mistake: Creating new database connections per request

const db = new Database(); // New connection every request = 100ms overhead

Fix: Use connection pool

const pool = new Pool({ max: 10 }); // Reuse connections = 0.5ms

9. Exposing Internal Implementation Details

Mistake: Returning raw database objects

res.json(user); // Includes password_hash, internal_id, etc.

Fix: Use DTOs (Data Transfer Objects)

res.json({
  id: user.id,
  name: user.name,
  email: user.email
  // Only public fields
});

10. Not Testing with Production Data Volumes

Mistake: Testing with 100 records when production has 10M

Fix: Load test with realistic data

// Seed 1M+ test records
// Run load tests: 1K concurrent users
// Identify bottlenecks before launch

Production Checklist

Database

Indexes on all WHERE, ORDER BY, JOIN columns
Connection pooling configured (max 10-20 connections)
Slow query logging enabled (>1s queries)
Read replicas for read-heavy workloads
Query result caching (Redis)
Regular ANALYZE/VACUUM (PostgreSQL)

Caching

Redis cache for frequently accessed data
HTTP caching headers (Cache-Control, ETag)
CDN for static assets + public API responses
Cache invalidation strategy documented
Cache hit/miss metrics tracked

Code

Parallel operations with Promise.all()
No N+1 queries (use joins/includes)
Background jobs for non-critical tasks
Pagination on all list endpoints
Field selection (don't return unnecessary data)

Network

Response compression (gzip/brotli)
HTTP/2 enabled
Payload size optimized (<100KB typical response)
CDN for global distribution

Infrastructure

Load balancer with health checks
Auto-scaling configured (2-10 instances)
Database read replicas (1-3 replicas)
CDN enabled (Cloudflare or similar)

Monitoring

APM installed (Datadog, New Relic, or Sentry)
Custom metrics (request rate, latency, errors)
Slow query alerts (>1s queries)
Error rate alerts (>1% errors)
Latency alerts (P95 >500ms)
Load testing performed (1K+ concurrent users)

Performance Targets

P50 latency <100ms
P95 latency <500ms
P99 latency <2000ms
Error rate <0.1%
Throughput: 1000+ RPS per server
Cache hit rate >70%

Tools & Resources

Performance Monitoring

Datadog APM - Full-stack observability
New Relic - Application performance monitoring
Sentry - Error + performance tracking
Grafana + Prometheus - Open-source monitoring stack

Load Testing

autocannon - Fast HTTP/1.1 benchmarking
k6 - Modern load testing
Apache JMeter - Enterprise load testing
Gatling - Scala-based load testing

Database Tools

pgAdmin - PostgreSQL management
DataGrip - Universal database IDE
PgHero - PostgreSQL performance dashboard

CDN Providers

Cloudflare - Free tier, 250+ locations
Fastly - Real-time purge
AWS CloudFront - AWS integration
Akamai - Enterprise CDN

Caching

Redis - In-memory data store
Memcached - Simple key-value cache
Varnish - HTTP accelerator

📡 Optimizing performance but still blind to third-party slowdowns? Your API is only as fast as its slowest dependency. Better Stack monitors response times across your entire stack every 30 seconds — catch latency regressions before they become outages.

Frequently Asked Questions

How much performance improvement can I expect?

Typical gains from this guide:

Database optimization: 10-100x faster queries
Caching: 50-1000x faster repeated requests
Code optimization: 2-10x faster processing
Infrastructure: 2-5x more throughput

Real example (e-commerce API):

Before: 3,200ms response time
After: 45ms response time
71x improvement

Should I optimize everything at once?

No. Follow this order:

Measure - Add APM to identify bottlenecks
Database - Usually 80% of performance issues
Caching - Redis + HTTP caching
Code - Parallel operations, background jobs
Infrastructure - Load balancing, CDN

Optimize the slowest 20% first (Pareto principle).

When should I add caching?

Add caching when:

Same data requested frequently (>10 times/minute)
Data doesn't change often (every 5+ minutes)
Database queries are slow (>100ms)
Traffic is growing (>1000 requests/hour)

Don't cache when:

Data changes constantly (real-time stock prices)
Each request is unique (user-specific content)
Data is already fast (<10ms queries)

How many database connections should I use?

Formula: (CPU cores × 2) + storage I/O

Examples:

4-core server with SSD: (4 × 2) + 1 = 9 connections
8-core server with HDD: (8 × 2) + 4 = 20 connections

Too many connections = contention, slower queries Too few connections = request queuing, timeouts

What's the difference between caching and CDN?

Caching (Redis, in-memory):

Stores computed results (database queries, API responses)
Server-side (your infrastructure)
Invalidate when data changes

CDN (Cloudflare, Fastly):

Stores static files + API responses
Edge locations worldwide (close to users)
Reduces latency for global users

Use both: Redis for dynamic data, CDN for global distribution

Should I use GraphQL or REST for performance?

GraphQL advantages:

Client specifies exact fields needed (smaller payloads)
Single request for multiple resources (no multiple round-trips)

GraphQL challenges:

N+1 query problem (requires DataLoader)
Caching harder (no URL-based cache keys)

REST advantages:

Simpler caching (URL-based)
Better CDN support

Verdict: Both can be fast with proper optimization. Use REST for public APIs, GraphQL for complex client needs.

How do I optimize API calls to third-party services?

Strategies:

Cache responses aggressively

const cachedResponse = await redis.get(`stripe:customer:${id}`);
if (cachedResponse) return JSON.parse(cachedResponse);

const customer = await stripe.customers.retrieve(id);
await redis.setex(`stripe:customer:${id}`, 3600, JSON.stringify(customer));

Batch requests when possible

// Bad: 100 API calls
for (const id of customerIds) {
  await stripe.customers.retrieve(id);
}

// Good: 1 API call
const customers = await stripe.customers.list({
  limit: 100,
  starting_after: lastId
});

Webhooks instead of polling

Stripe sends webhooks when data changes
No need to poll for updates every minute

Monitor third-party API status

Use API Status Check to track outages
Implement circuit breakers for failing APIs
Have fallback strategies

What if my database is still slow after indexing?

Check these:

Index is being used

EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Look for "Index Scan" not "Seq Scan"

Query is optimized

Avoid SELECT * (fetch only needed columns)
Use LIMIT for large results
Check for N+1 queries

Database resources

CPU usage <80%?
Memory usage <80%?
Disk I/O not saturated?

Consider scaling

Read replicas for read-heavy workloads
Vertical scaling (more CPU/RAM)
Query result caching (Redis)

How do I prevent caching stale data?

Strategies:

1. Time-based invalidation (TTL)

redis.setex('key', 300, value); // Cache for 5 minutes

2. Event-based invalidation

async function updateProduct(id, data) {
  await db.product.update({ where: { id }, data });
  await redis.del(`product:${id}`); // Invalidate immediately
}

3. Cache versioning

const version = await redis.get('cache:version') || 1;
const key = `product:${id}:v${version}`;

// When data structure changes:
await redis.incr('cache:version'); // Invalidates all caches

4. Conditional requests (HTTP)

res.set('ETag', generateHash(data));
if (req.headers['if-none-match'] === etag) {
  return res.status(304).end(); // Not modified
}

Next Steps

Add monitoring - Install APM (Datadog, New Relic, or Sentry)
Identify bottlenecks - Find slowest 20% of endpoints
Optimize database - Add indexes, fix N+1 queries
Add caching - Redis for frequently accessed data
Load test - Test with realistic traffic (1K+ concurrent users)
Monitor production - Track P50/P95/P99 latency, error rates

Related guides:

Monitor critical API dependencies:

Check real-time API status at apistatuscheck.com - monitoring 160+ third-party APIs including AI platforms, cloud providers, payment gateways, and developer tools.

API Performance Optimization: Complete Guide for Production Systems

Why API Performance Matters

Impact on Business Metrics

Real-World Impact

Key Performance Metrics

Essential Metrics to Track

Database Optimization

Indexing Strategy

Query Optimization

Connection Pooling

Database Monitoring

Caching Strategies

Cache Hierarchy (Fastest to Slowest)

In-Memory Caching

Redis Caching (Production Standard)

HTTP Caching Headers

CDN Caching for Global Performance

Code-Level Optimization

Async/Await vs Synchronous Operations

Request Batching (DataLoader Pattern)

JSON Payload Optimization

Avoid Synchronous Operations in Request Path

Network Optimization

Response Compression

HTTP/2 Server Push (Node.js)

Minimize Payload Size

Infrastructure Optimization

Load Balancing

Auto-Scaling

Database Read Replicas

Monitoring & Profiling

Application Performance Monitoring (APM)

Performance Benchmarking

Real-Time Performance Monitoring

Real-World Examples

Example 1: E-Commerce Product API

Example 2: Analytics Dashboard API

Example 3: Search API

Common Mistakes to Avoid

1. Not Using Database Indexes

2. Fetching All Data When You Need Aggregates

3. Not Implementing Pagination

4. Caching Everything Forever

5. Synchronous Operations in Request Path

6. Not Monitoring Performance

7. Over-Optimization Too Early

8. Not Using Connection Pooling

9. Exposing Internal Implementation Details

10. Not Testing with Production Data Volumes

Production Checklist

Database

Caching

Code

Network

Infrastructure

Monitoring

Performance Targets

Tools & Resources

Performance Monitoring

Load Testing

Database Tools

CDN Providers

Caching

Frequently Asked Questions

How much performance improvement can I expect?

Should I optimize everything at once?

When should I add caching?

How many database connections should I use?

What's the difference between caching and CDN?

Should I use GraphQL or REST for performance?

How do I optimize API calls to third-party services?

What if my database is still slow after indexing?

How do I prevent caching stale data?

Next Steps

Stop checking API status pages manually