API Performance Optimization: Complete Guide for Production Systems

by API Status Check Team
Staff Pick

๐Ÿ“ก Monitor your APIs โ€” know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free โ†’

Affiliate link โ€” we may earn a commission at no extra cost to you

API Performance Optimization: Complete Guide for Production Systems

Slow APIs kill user experience, waste infrastructure costs, and limit scalability. A 100ms delay can reduce conversions by 7%. A 1-second delay can cost Amazon $1.6 billion in sales annually.

This guide covers production-ready performance optimization strategies used by Stripe, Shopify, and AWS to serve millions of requests per day with sub-100ms response times.

Why API Performance Matters

Impact on Business Metrics

User Experience & Conversions

  • 100ms delay โ†’ 7% drop in conversions (Amazon study)
  • 1-second delay โ†’ 11% fewer page views, 7% loss in conversions
  • 3-second load time โ†’ 53% of mobile users abandon

Infrastructure Costs

  • Slow APIs = more concurrent requests = higher server costs
  • Example: Reducing response time from 500ms to 100ms โ†’ 5x fewer concurrent connections
  • $10,000/month servers โ†’ $2,000/month with proper optimization

Scalability

  • Faster responses = more requests per server
  • Example: 100ms responses = 10 req/sec per connection vs 1 req/sec at 1000ms
  • 10x throughput with same infrastructure

Real-World Impact

Shopify Black Friday 2023

  • 1.8M requests/second peak
  • Sub-100ms P95 response times
  • Result: Zero downtime, $9.3B in sales

Stripe API

  • 99.99% uptime
  • P99 latency <200ms globally
  • Handles $640B annually

Netflix API

  • 1 billion API calls/day
  • P99 latency <100ms
  • Powers 230M subscribers globally

Key Performance Metrics

Essential Metrics to Track

Response Time (Latency)

P50 (median): 50% of requests faster than this
P95: 95% of requests faster than this
P99: 99% of requests faster than this (catches outliers)

Why P99 matters more than average:

  • Average: 100ms (looks good!)
  • P99: 5000ms (5% of users waiting 5 seconds = terrible UX)

Throughput

  • Requests per second (RPS) your API can handle
  • Example targets:
    • Small API: 100-1,000 RPS
    • Medium: 1,000-10,000 RPS
    • Large: 10,000+ RPS

Error Rate

  • 4xx errors: Client mistakes (not your fault)
  • 5xx errors: Server failures (your fault)
  • Target: <0.1% error rate under normal load

Saturation

  • CPU usage (target: <70% average, <90% peak)
  • Memory usage (target: <80%)
  • Database connections (target: <80% of pool)

Database Optimization

Database queries are the #1 performance bottleneck in most APIs.

Indexing Strategy

Before Indexing (Sequential Scan)

-- Query: Find user by email
SELECT * FROM users WHERE email = 'user@example.com';
-- Execution: 2,500ms (scans 1 million rows)

After Indexing

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Same query now: 8ms (index lookup)
-- 312x faster!

Composite Indexes for Multi-Column Queries

-- Query: Active orders for user in date range
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'active' 
  AND created_at > '2026-01-01';

-- Index order matters!
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at DESC);

-- user_id first (most selective filter)
-- status second (additional filter)
-- created_at last (for sorting)

Check Index Usage

-- PostgreSQL: Explain query plan
EXPLAIN ANALYZE 
SELECT * FROM orders WHERE user_id = 123;

-- Look for:
-- โœ… "Index Scan" or "Index Only Scan"
-- โŒ "Seq Scan" (sequential scan = missing index)

Query Optimization

N+1 Query Problem (Most Common Mistake)

// โŒ BAD: N+1 queries (1 + 100 = 101 database roundtrips)
async function getOrdersWithUsers() {
  const orders = await db.order.findMany(); // 1 query
  
  for (const order of orders) {
    // 100 separate queries!
    order.user = await db.user.findUnique({
      where: { id: order.userId }
    });
  }
  
  return orders;
}
// Response time: 2,500ms

// โœ… GOOD: Single query with join
async function getOrdersWithUsers() {
  const orders = await db.order.findMany({
    include: {
      user: true // Prisma automatically joins
    }
  });
  
  return orders;
}
// Response time: 45ms (55x faster!)

Select Only What You Need

// โŒ BAD: Fetching unnecessary data
const users = await db.user.findMany(); 
// Returns: id, email, password_hash, created_at, updated_at, profile, settings...
// Payload: 50KB per user ร— 100 users = 5MB

// โœ… GOOD: Select specific fields
const users = await db.user.findMany({
  select: {
    id: true,
    email: true,
    name: true
  }
});
// Payload: 2KB per user ร— 100 users = 200KB (25x smaller!)

Pagination for Large Datasets

// โŒ BAD: Loading all records
const allOrders = await db.order.findMany(); // 1 million rows = 500MB = 30 seconds
// OOM crash on large datasets

// โœ… GOOD: Cursor-based pagination
const orders = await db.order.findMany({
  take: 100,
  cursor: lastOrderId ? { id: lastOrderId } : undefined,
  orderBy: { created_at: 'desc' }
});
// Returns 100 rows in 12ms

Connection Pooling

Problem: Creating new database connections is slow

  • New connection: 50-200ms
  • Pooled connection: 0.5ms
  • 400x faster with connection pooling
import { PrismaClient } from '@prisma/client';

// โœ… Connection pool configuration
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL
    }
  },
  // Connection pool settings
  pool: {
    min: 2,          // Minimum connections always open
    max: 10,         // Maximum concurrent connections
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 30000
  }
});

// Connection lifecycle:
// 1. Request arrives โ†’ get connection from pool (0.5ms)
// 2. Execute query
// 3. Return connection to pool (reused by next request)

Pool Sizing Formula

Optimal pool size = (core_count ร— 2) + effective_spindle_count

Example for typical web server:
- 4 CPU cores
- SSD storage (spindle = 1)
- Pool size = (4 ร— 2) + 1 = 9 connections

Database Monitoring

Slow Query Logging

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient({
  log: [
    {
      emit: 'event',
      level: 'query'
    }
  ]
});

prisma.$on('query', (e) => {
  if (e.duration > 1000) { // Queries slower than 1 second
    console.warn('Slow query detected:', {
      query: e.query,
      duration: `${e.duration}ms`,
      timestamp: e.timestamp
    });
  }
});

Key Metrics to Track

  • Query execution time (P50, P95, P99)
  • Connection pool utilization
  • Slow query count
  • Lock wait time
  • Cache hit ratio

Caching Strategies

Caching = storing computed results to avoid redundant work.

Cache Hierarchy (Fastest to Slowest)

  1. In-Memory Cache (0.1ms) - Node.js Map/LRU
  2. Redis Cache (1-2ms) - Shared across servers
  3. CDN Cache (10-50ms) - Global edge locations
  4. Database (20-200ms) - No cache
  5. External API (100-2000ms) - Third-party service

In-Memory Caching

import NodeCache from 'node-cache';

// Create cache with 5-minute TTL
const cache = new NodeCache({ stdTTL: 300 });

async function getUser(userId: string) {
  // Check cache first
  const cached = cache.get<User>(`user:${userId}`);
  if (cached) {
    console.log('Cache HIT');
    return cached; // 0.1ms response time
  }
  
  console.log('Cache MISS');
  // Fetch from database
  const user = await db.user.findUnique({ where: { id: userId } });
  
  // Store in cache
  cache.set(`user:${userId}`, user);
  
  return user; // 45ms first time, 0.1ms after
}

Use Cases for In-Memory Cache

  • User sessions
  • Configuration data
  • Frequently accessed lookup tables
  • API responses that rarely change

Limitations

  • Not shared across servers (each server has own cache)
  • Lost on server restart
  • Limited by RAM (max ~1-2GB typically)

Redis Caching (Production Standard)

import Redis from 'ioredis';

const redis = new Redis({
  host: 'localhost',
  port: 6379,
  maxRetriesPerRequest: 3
});

async function getProductWithCache(productId: string) {
  const cacheKey = `product:${productId}`;
  
  // Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Fetch from database
  const product = await db.product.findUnique({
    where: { id: productId },
    include: { reviews: true }
  });
  
  // Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(product));
  
  return product;
}

Cache Invalidation Strategies

Time-Based (TTL)

// Cache for 5 minutes
await redis.setex('key', 300, value);
// Pros: Simple, prevents stale data
// Cons: May serve outdated data for up to 5 minutes

Event-Based Invalidation

// Invalidate when data changes
async function updateProduct(id: string, data: ProductUpdate) {
  // Update database
  const product = await db.product.update({
    where: { id },
    data
  });
  
  // Invalidate cache immediately
  await redis.del(`product:${id}`);
  
  return product;
}
// Pros: Always fresh data
// Cons: Requires invalidation logic everywhere

Cache-Aside Pattern (Lazy Loading)

// 1. Check cache
// 2. If miss, fetch from DB
// 3. Store in cache
// 4. Return data

// Most common pattern for read-heavy workloads

HTTP Caching Headers

import express from 'express';

app.get('/api/products/:id', async (req, res) => {
  const product = await getProduct(req.params.id);
  
  // โœ… Cache in browser for 5 minutes
  res.set('Cache-Control', 'public, max-age=300');
  
  // โœ… ETag for conditional requests
  const etag = generateETag(product);
  res.set('ETag', etag);
  
  // If client's ETag matches, return 304 Not Modified
  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end(); // No data transfer!
  }
  
  res.json(product);
});

Cache-Control Directives

public          - Can be cached by browsers + CDNs
private         - Only cached by browser (sensitive data)
no-cache        - Must revalidate with server before using
no-store        - Never cache (credit card data, etc.)
max-age=300     - Cache for 300 seconds (5 minutes)
s-maxage=3600   - CDN can cache for 1 hour

CDN Caching for Global Performance

// Example: Cloudflare caching
app.get('/api/public/products', async (req, res) => {
  const products = await getProducts();
  
  // โœ… Cache at CDN edge for 1 hour
  res.set('Cache-Control', 'public, s-maxage=3600');
  
  // โœ… Cloudflare-specific header
  res.set('CDN-Cache-Control', 'max-age=3600');
  
  res.json(products);
});

Performance Impact of CDN

  • Without CDN: 200ms (US East โ†’ Singapore)
  • With CDN: 15ms (cached at Singapore edge)
  • 13x faster for global users

Best CDN Providers for APIs

  • Cloudflare - Free tier, 250+ locations
  • Fastly - Real-time purge, low latency
  • AWS CloudFront - Tight AWS integration
  • Akamai - Enterprise-grade, massive scale

Code-Level Optimization

Async/Await vs Synchronous Operations

// โŒ BAD: Sequential operations (waterfall)
async function getUserData(userId: string) {
  const user = await db.user.findUnique({ where: { id: userId } });     // 50ms
  const orders = await db.order.findMany({ where: { userId } });         // 80ms
  const reviews = await db.review.findMany({ where: { userId } });       // 60ms
  
  return { user, orders, reviews };
}
// Total time: 50 + 80 + 60 = 190ms

// โœ… GOOD: Parallel operations
async function getUserData(userId: string) {
  const [user, orders, reviews] = await Promise.all([
    db.user.findUnique({ where: { id: userId } }),
    db.order.findMany({ where: { userId } }),
    db.review.findMany({ where: { userId } })
  ]);
  
  return { user, orders, reviews };
}
// Total time: max(50, 80, 60) = 80ms (2.4x faster!)

Request Batching (DataLoader Pattern)

import DataLoader from 'dataloader';

// โŒ PROBLEM: N+1 queries in GraphQL/nested requests
// Fetching 100 orders โ†’ 100 separate user queries

// โœ… SOLUTION: Batch requests into single query
const userLoader = new DataLoader(async (userIds: string[]) => {
  // Batch: Load all users in single query
  const users = await db.user.findMany({
    where: { id: { in: userIds } }
  });
  
  // Return in same order as requested IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

// Usage
async function getOrders() {
  const orders = await db.order.findMany();
  
  // Automatically batches into single query!
  for (const order of orders) {
    order.user = await userLoader.load(order.userId);
  }
  
  return orders;
}

// Result:
// - 100 orders fetched: 1 query
// - 100 users fetched: 1 batched query (not 100!)
// - Total: 2 queries instead of 101 (50x fewer DB calls)

JSON Payload Optimization

// โŒ BAD: Returning entire objects with unnecessary fields
app.get('/api/users', async (req, res) => {
  const users = await db.user.findMany({
    include: {
      profile: true,
      settings: true,
      orders: {
        include: {
          items: true,
          shipping: true
        }
      }
    }
  });
  
  res.json(users);
});
// Payload: 500KB per user ร— 100 users = 50MB response!

// โœ… GOOD: Return only what's needed
app.get('/api/users', async (req, res) {
  const users = await db.user.findMany({
    select: {
      id: true,
      name: true,
      email: true,
      avatar: true
    }
  });
  
  res.json(users);
});
// Payload: 500 bytes per user ร— 100 users = 50KB (1000x smaller!)

Avoid Synchronous Operations in Request Path

// โŒ BAD: Blocking operations during request
app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // โŒ Blocks response while sending email (500ms)
  await sendOrderConfirmationEmail(order);
  
  // โŒ Blocks response while updating analytics (200ms)
  await updateAnalytics(order);
  
  res.json(order);
});
// Response time: 700ms + database time

// โœ… GOOD: Queue non-critical work
import Queue from 'bull';
const emailQueue = new Queue('emails', redisConfig);

app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // โœ… Queue email (non-blocking, 1ms)
  emailQueue.add({ orderId: order.id });
  
  // โœ… Fire and forget analytics
  updateAnalytics(order).catch(console.error);
  
  res.json(order); // Returns immediately!
});
// Response time: database time only (95% faster)

Network Optimization

Response Compression

import compression from 'compression';
import express from 'express';

const app = express();

// โœ… Enable gzip compression
app.use(compression({
  level: 6,              // Compression level (1-9, 6 is balanced)
  threshold: 1024,       // Only compress responses > 1KB
  filter: (req, res) => {
    // Don't compress images/videos (already compressed)
    if (req.headers['x-no-compression']) {
      return false;
    }
    return compression.filter(req, res);
  }
}));

// Impact:
// - JSON response: 100KB โ†’ 15KB (85% smaller)
// - Transfer time: 200ms โ†’ 30ms over 3G
// - 6.6x faster download

HTTP/2 Server Push (Node.js)

import { createSecureServer } from 'http2';
import { readFileSync } from 'fs';

const server = createSecureServer({
  key: readFileSync('server.key'),
  cert: readFileSync('server.cert')
});

server.on('stream', (stream, headers) => {
  if (headers[':path'] === '/api/dashboard') {
    // โœ… Push critical resources before client requests them
    stream.pushStream({ ':path': '/api/user' }, (err, pushStream) => {
      pushStream.respond({ ':status': 200 });
      pushStream.end(JSON.stringify(userData));
    });
    
    stream.respond({ ':status': 200 });
    stream.end(JSON.stringify(dashboardData));
  }
});

// Impact:
// - HTTP/1.1: Request dashboard (100ms) โ†’ Request user (100ms) = 200ms
// - HTTP/2: Request dashboard + pushed user data = 100ms
// - 2x faster

Minimize Payload Size

Use Field Selection (GraphQL-style)

// Allow clients to specify which fields they want
app.get('/api/users', async (req, res) => {
  const fields = req.query.fields?.split(',') || ['id', 'name', 'email'];
  
  const select = fields.reduce((acc, field) => {
    acc[field] = true;
    return acc;
  }, {} as any);
  
  const users = await db.user.findMany({ select });
  res.json(users);
});

// Usage: /api/users?fields=id,name,email
// Returns only requested fields (smaller payload)

Remove Null Values

function removeNulls(obj: any): any {
  if (Array.isArray(obj)) {
    return obj.map(removeNulls).filter(v => v != null);
  }
  if (obj !== null && typeof obj === 'object') {
    return Object.entries(obj)
      .filter(([_, v]) => v != null)
      .reduce((acc, [k, v]) => ({ ...acc, [k]: removeNulls(v) }), {});
  }
  return obj;
}

// Before: { id: 1, name: "Alice", bio: null, avatar: null } = 50 bytes
// After:  { id: 1, name: "Alice" } = 24 bytes (52% smaller)

Infrastructure Optimization

Load Balancing

// Simple round-robin load balancer with health checks
import express from 'express';
import axios from 'axios';

const servers = [
  'http://server1:3000',
  'http://server2:3000',
  'http://server3:3000'
];

let currentIndex = 0;
const healthStatus = new Map<string, boolean>();

// Health check every 30 seconds
setInterval(async () => {
  for (const server of servers) {
    try {
      await axios.get(`${server}/health`, { timeout: 1000 });
      healthStatus.set(server, true);
    } catch {
      healthStatus.set(server, false);
      console.warn(`Server ${server} is DOWN`);
    }
  }
}, 30000);

// Proxy requests to healthy servers
app.use(async (req, res) => {
  const healthyServers = servers.filter(s => healthStatus.get(s) !== false);
  
  if (healthyServers.length === 0) {
    return res.status(503).json({ error: 'No healthy servers' });
  }
  
  // Round-robin selection
  const targetServer = healthyServers[currentIndex % healthyServers.length];
  currentIndex++;
  
  try {
    const response = await axios({
      method: req.method,
      url: `${targetServer}${req.path}`,
      data: req.body,
      headers: req.headers,
      timeout: 5000
    });
    
    res.status(response.status).json(response.data);
  } catch (error) {
    res.status(500).json({ error: 'Server error' });
  }
});

Production Load Balancers

  • NGINX - Industry standard, 100K+ req/sec
  • HAProxy - Layer 4/7 balancing, health checks
  • AWS ALB - Managed, auto-scaling
  • Cloudflare Load Balancing - Global, DDoS protection

Auto-Scaling

# Example: Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2        # Always at least 2 pods
  maxReplicas: 10       # Scale up to 10 during traffic spikes
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when memory > 80%

Auto-Scaling Decisions

  • Scale UP: CPU > 70% for 2 minutes
  • Scale DOWN: CPU < 30% for 5 minutes
  • Result: Right-sized infrastructure = optimal cost

Database Read Replicas

import { PrismaClient } from '@prisma/client';

// Primary database (writes)
const prismaWrite = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_PRIMARY_URL }
  }
});

// Read replica (reads only)
const prismaRead = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_REPLICA_URL }
  }
});

// Write operations โ†’ primary
async function createOrder(data: OrderInput) {
  return prismaWrite.order.create({ data });
}

// Read operations โ†’ replica (reduces primary load)
async function getOrders(userId: string) {
  return prismaRead.order.findMany({
    where: { userId }
  });
}

// Impact:
// - Primary handles 100% writes + 0% reads
// - Replica handles 100% reads
// - 3 replicas = 75% load reduction on primary

Monitoring & Profiling

Application Performance Monitoring (APM)

import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  integrations: [
    new ProfilingIntegration()
  ],
  tracesSampleRate: 0.1, // Sample 10% of requests
  profilesSampleRate: 0.1
});

// Automatic performance tracking
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

// Custom performance tracking
app.get('/api/expensive-operation', async (req, res) => {
  const transaction = Sentry.startTransaction({
    name: 'Expensive Operation',
    op: 'http.request'
  });
  
  const span1 = transaction.startChild({ op: 'database.query', description: 'Fetch users' });
  const users = await db.user.findMany();
  span1.finish();
  
  const span2 = transaction.startChild({ op: 'processing', description: 'Transform data' });
  const transformed = processUsers(users);
  span2.finish();
  
  transaction.finish();
  
  res.json(transformed);
});

// View in Sentry dashboard:
// - Total request time: 450ms
// - Database query: 250ms (55% of time)
// - Processing: 200ms (44% of time)
// โ†’ Optimize database query first!

Best APM Tools

  • Datadog APM - Full-stack observability
  • New Relic - Real user monitoring
  • Sentry - Error + performance tracking
  • AWS X-Ray - Distributed tracing for AWS

Performance Benchmarking

import autocannon from 'autocannon';

// Load test your API
async function benchmarkAPI() {
  const result = await autocannon({
    url: 'http://localhost:3000/api/users',
    connections: 100,    // 100 concurrent connections
    duration: 30,        // 30 seconds
    pipelining: 1
  });
  
  console.log('Performance Results:');
  console.log(`Requests/sec: ${result.requests.average}`);
  console.log(`Latency P50: ${result.latency.p50}ms`);
  console.log(`Latency P95: ${result.latency.p95}ms`);
  console.log(`Latency P99: ${result.latency.p99}ms`);
  console.log(`Error rate: ${(result.non2xx / result.requests.total) * 100}%`);
}

// Example output:
// Requests/sec: 5,240
// Latency P50: 15ms
// Latency P95: 48ms
// Latency P99: 120ms
// Error rate: 0.02%

Load Testing Tools

  • autocannon (Node.js) - 40K+ req/sec benchmarking
  • k6 - Modern load testing, Grafana integration
  • Apache JMeter - Enterprise-grade, GUI
  • wrk - Minimal, blazing fast

Real-Time Performance Monitoring

import prometheus from 'prom-client';

// Create metrics
const httpRequestDuration = new prometheus.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] // Response time buckets
});

const httpRequestsTotal = new prometheus.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

// Middleware to track metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    
    httpRequestDuration.observe({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    }, duration);
    
    httpRequestsTotal.inc({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    });
  });
  
  next();
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', prometheus.register.contentType);
  res.end(await prometheus.register.metrics());
});

// Visualize in Grafana:
// - Request rate over time
// - P50/P95/P99 latency graphs
// - Error rate trends
// - Slow endpoint identification

Real-World Examples

Example 1: E-Commerce Product API

Before Optimization

app.get('/api/products/:id', async (req, res) => {
  const product = await db.product.findUnique({
    where: { id: req.params.id },
    include: {
      reviews: true,      // 1,000+ reviews
      variants: true,     // 50 variants
      images: true,       // 10 images
      relatedProducts: {
        include: {
          reviews: true,  // Another 1,000+ reviews per product
          images: true
        }
      }
    }
  });
  
  res.json(product);
});
// Response time: 3,200ms
// Payload: 850KB

After Optimization

import NodeCache from 'node-cache';
const cache = new NodeCache({ stdTTL: 300 });

app.get('/api/products/:id', async (req, res) => {
  const productId = req.params.id;
  
  // 1. Check cache
  const cached = cache.get(productId);
  if (cached) {
    res.set('X-Cache', 'HIT');
    return res.json(cached);
  }
  
  // 2. Optimized query
  const product = await db.product.findUnique({
    where: { id: productId },
    select: {
      id: true,
      name: true,
      price: true,
      description: true,
      images: {
        take: 3,  // Only first 3 images
        select: { url: true, alt: true }
      },
      reviews: {
        take: 5,  // Only latest 5 reviews
        orderBy: { created_at: 'desc' },
        select: { rating: true, comment: true, author: true }
      }
    }
  });
  
  // 3. Aggregate stats instead of loading all data
  const [avgRating, reviewCount] = await db.$queryRaw`
    SELECT 
      AVG(rating) as avg_rating,
      COUNT(*) as review_count
    FROM reviews 
    WHERE product_id = ${productId}
  `;
  
  const result = {
    ...product,
    avgRating,
    reviewCount
  };
  
  // 4. Cache result
  cache.set(productId, result);
  
  // 5. HTTP caching
  res.set('Cache-Control', 'public, max-age=300');
  res.set('X-Cache', 'MISS');
  
  res.json(result);
});

// Response time: 45ms (71x faster!)
// Payload: 12KB (70x smaller!)

Example 2: Analytics Dashboard API

Before Optimization

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  
  // Sequential queries (waterfall)
  const pageViews = await db.pageView.count({ where: { userId } });
  const uniqueVisitors = await db.visitor.count({ where: { userId } });
  const revenue = await db.order.aggregate({
    where: { userId },
    _sum: { total: true }
  });
  const topPages = await db.pageView.groupBy({
    by: ['page'],
    where: { userId },
    _count: true,
    orderBy: { _count: { page: 'desc' } },
    take: 10
  });
  
  res.json({ pageViews, uniqueVisitors, revenue, topPages });
});
// Response time: 2,800ms

After Optimization

import Redis from 'ioredis';
const redis = new Redis();

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  const cacheKey = `analytics:${userId}:${new Date().toISOString().split('T')[0]}`;
  
  // 1. Check Redis cache (daily cache)
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }
  
  // 2. Parallel queries
  const [pageViews, uniqueVisitors, revenue, topPages] = await Promise.all([
    db.pageView.count({ where: { userId } }),
    db.visitor.count({ where: { userId } }),
    db.order.aggregate({
      where: { userId },
      _sum: { total: true }
    }),
    db.pageView.groupBy({
      by: ['page'],
      where: { userId },
      _count: true,
      orderBy: { _count: { page: 'desc' } },
      take: 10
    })
  ]);
  
  const result = { pageViews, uniqueVisitors, revenue, topPages };
  
  // 3. Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(result));
  
  res.json(result);
});

// Response time: 180ms first load, 2ms cached (1555x faster!)

Example 3: Search API

Before Optimization

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Full-text search across all fields
  const results = await db.product.findMany({
    where: {
      OR: [
        { name: { contains: query } },
        { description: { contains: query } },
        { category: { contains: query } },
        { tags: { has: query } }
      ]
    }
  });
  
  res.json(results);
});
// Response time: 4,500ms for 100,000+ products
// No relevance ranking

After Optimization with Elasticsearch

import { Client } from '@elastic/elasticsearch';
const elastic = new Client({ node: 'http://localhost:9200' });

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Elasticsearch full-text search with relevance ranking
  const { hits } = await elastic.search({
    index: 'products',
    body: {
      query: {
        multi_match: {
          query,
          fields: ['name^3', 'description', 'category^2', 'tags'],
          fuzziness: 'AUTO' // Handle typos
        }
      },
      size: 20,
      from: (req.query.page || 0) * 20,
      highlight: {
        fields: {
          name: {},
          description: {}
        }
      }
    }
  });
  
  const results = hits.hits.map(hit => ({
    ...hit._source,
    score: hit._score,
    highlights: hit.highlight
  }));
  
  res.json(results);
});

// Response time: 25ms (180x faster!)
// Relevance ranking + typo tolerance + highlighting

Common Mistakes to Avoid

1. Not Using Database Indexes

Symptom: Queries that take seconds on tables with 100K+ rows

Fix: Add indexes on columns used in WHERE, ORDER BY, JOIN

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

2. Fetching All Data When You Need Aggregates

Mistake: Loading 1M records to count them

const users = await db.user.findMany(); // Loads 1M records = 500MB
const count = users.length; // ๐Ÿ˜ฑ

Fix: Use database aggregation

const count = await db.user.count(); // Returns number only

3. Not Implementing Pagination

Mistake: Returning unlimited results

const products = await db.product.findMany(); // Returns 100,000 products = 50MB

Fix: Always paginate

const products = await db.product.findMany({
  take: 50,
  skip: (page - 1) * 50
});

4. Caching Everything Forever

Mistake: No cache invalidation strategy

cache.set('key', value); // Cached forever, even when data changes

Fix: Use appropriate TTLs

cache.set('key', value, 300); // 5 minutes for frequently changing data
cache.set('config', configValue, 3600); // 1 hour for rarely changing data

5. Synchronous Operations in Request Path

Mistake: Blocking response for non-critical tasks

await sendEmail(); // Blocks response for 500ms
await updateAnalytics(); // Blocks another 200ms
res.json(result); // User waits 700ms unnecessarily

Fix: Queue background jobs

emailQueue.add({ orderId }); // Returns in 1ms
res.json(result); // User gets response immediately

6. Not Monitoring Performance

Mistake: No visibility into slow endpoints

Fix: Add APM + custom metrics

// Track every endpoint's performance
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.recordLatency(req.route?.path, duration);
  });
  next();
});

7. Over-Optimization Too Early

Mistake: Complex caching before measuring actual bottlenecks

Fix:

  1. Measure first (add metrics)
  2. Identify bottleneck (slowest 20% of endpoints)
  3. Optimize only the bottleneck
  4. Measure again
  5. Repeat

8. Not Using Connection Pooling

Mistake: Creating new database connections per request

const db = new Database(); // New connection every request = 100ms overhead

Fix: Use connection pool

const pool = new Pool({ max: 10 }); // Reuse connections = 0.5ms

9. Exposing Internal Implementation Details

Mistake: Returning raw database objects

res.json(user); // Includes password_hash, internal_id, etc.

Fix: Use DTOs (Data Transfer Objects)

res.json({
  id: user.id,
  name: user.name,
  email: user.email
  // Only public fields
});

10. Not Testing with Production Data Volumes

Mistake: Testing with 100 records when production has 10M

Fix: Load test with realistic data

// Seed 1M+ test records
// Run load tests: 1K concurrent users
// Identify bottlenecks before launch

Production Checklist

Database

  • Indexes on all WHERE, ORDER BY, JOIN columns
  • Connection pooling configured (max 10-20 connections)
  • Slow query logging enabled (>1s queries)
  • Read replicas for read-heavy workloads
  • Query result caching (Redis)
  • Regular ANALYZE/VACUUM (PostgreSQL)

Caching

  • Redis cache for frequently accessed data
  • HTTP caching headers (Cache-Control, ETag)
  • CDN for static assets + public API responses
  • Cache invalidation strategy documented
  • Cache hit/miss metrics tracked

Code

  • Parallel operations with Promise.all()
  • No N+1 queries (use joins/includes)
  • Background jobs for non-critical tasks
  • Pagination on all list endpoints
  • Field selection (don't return unnecessary data)

Network

  • Response compression (gzip/brotli)
  • HTTP/2 enabled
  • Payload size optimized (<100KB typical response)
  • CDN for global distribution

Infrastructure

  • Load balancer with health checks
  • Auto-scaling configured (2-10 instances)
  • Database read replicas (1-3 replicas)
  • CDN enabled (Cloudflare or similar)

Monitoring

  • APM installed (Datadog, New Relic, or Sentry)
  • Custom metrics (request rate, latency, errors)
  • Slow query alerts (>1s queries)
  • Error rate alerts (>1% errors)
  • Latency alerts (P95 >500ms)
  • Load testing performed (1K+ concurrent users)

Performance Targets

  • P50 latency <100ms
  • P95 latency <500ms
  • P99 latency <2000ms
  • Error rate <0.1%
  • Throughput: 1000+ RPS per server
  • Cache hit rate >70%

Tools & Resources

Performance Monitoring

  • Datadog APM - Full-stack observability
  • New Relic - Application performance monitoring
  • Sentry - Error + performance tracking
  • Grafana + Prometheus - Open-source monitoring stack

Load Testing

  • autocannon - Fast HTTP/1.1 benchmarking
  • k6 - Modern load testing
  • Apache JMeter - Enterprise load testing
  • Gatling - Scala-based load testing

Database Tools

  • pgAdmin - PostgreSQL management
  • DataGrip - Universal database IDE
  • PgHero - PostgreSQL performance dashboard

CDN Providers

  • Cloudflare - Free tier, 250+ locations
  • Fastly - Real-time purge
  • AWS CloudFront - AWS integration
  • Akamai - Enterprise CDN

Caching

  • Redis - In-memory data store
  • Memcached - Simple key-value cache
  • Varnish - HTTP accelerator

๐Ÿ“ก Optimizing performance but still blind to third-party slowdowns? Your API is only as fast as its slowest dependency. Better Stack monitors response times across your entire stack every 30 seconds โ€” catch latency regressions before they become outages.

Frequently Asked Questions

How much performance improvement can I expect?

Typical gains from this guide:

  • Database optimization: 10-100x faster queries
  • Caching: 50-1000x faster repeated requests
  • Code optimization: 2-10x faster processing
  • Infrastructure: 2-5x more throughput

Real example (e-commerce API):

  • Before: 3,200ms response time
  • After: 45ms response time
  • 71x improvement

Should I optimize everything at once?

No. Follow this order:

  1. Measure - Add APM to identify bottlenecks
  2. Database - Usually 80% of performance issues
  3. Caching - Redis + HTTP caching
  4. Code - Parallel operations, background jobs
  5. Infrastructure - Load balancing, CDN

Optimize the slowest 20% first (Pareto principle).

When should I add caching?

Add caching when:

  • Same data requested frequently (>10 times/minute)
  • Data doesn't change often (every 5+ minutes)
  • Database queries are slow (>100ms)
  • Traffic is growing (>1000 requests/hour)

Don't cache when:

  • Data changes constantly (real-time stock prices)
  • Each request is unique (user-specific content)
  • Data is already fast (<10ms queries)

How many database connections should I use?

Formula: (CPU cores ร— 2) + storage I/O

Examples:

  • 4-core server with SSD: (4 ร— 2) + 1 = 9 connections
  • 8-core server with HDD: (8 ร— 2) + 4 = 20 connections

Too many connections = contention, slower queries Too few connections = request queuing, timeouts

What's the difference between caching and CDN?

Caching (Redis, in-memory):

  • Stores computed results (database queries, API responses)
  • Server-side (your infrastructure)
  • Invalidate when data changes

CDN (Cloudflare, Fastly):

  • Stores static files + API responses
  • Edge locations worldwide (close to users)
  • Reduces latency for global users

Use both: Redis for dynamic data, CDN for global distribution

Should I use GraphQL or REST for performance?

GraphQL advantages:

  • Client specifies exact fields needed (smaller payloads)
  • Single request for multiple resources (no multiple round-trips)

GraphQL challenges:

  • N+1 query problem (requires DataLoader)
  • Caching harder (no URL-based cache keys)

REST advantages:

  • Simpler caching (URL-based)
  • Better CDN support

Verdict: Both can be fast with proper optimization. Use REST for public APIs, GraphQL for complex client needs.

How do I optimize API calls to third-party services?

Strategies:

  1. Cache responses aggressively
const cachedResponse = await redis.get(`stripe:customer:${id}`);
if (cachedResponse) return JSON.parse(cachedResponse);

const customer = await stripe.customers.retrieve(id);
await redis.setex(`stripe:customer:${id}`, 3600, JSON.stringify(customer));
  1. Batch requests when possible
// Bad: 100 API calls
for (const id of customerIds) {
  await stripe.customers.retrieve(id);
}

// Good: 1 API call
const customers = await stripe.customers.list({
  limit: 100,
  starting_after: lastId
});
  1. Webhooks instead of polling
  • Stripe sends webhooks when data changes
  • No need to poll for updates every minute
  1. Monitor third-party API status
  • Use API Status Check to track outages
  • Implement circuit breakers for failing APIs
  • Have fallback strategies

What if my database is still slow after indexing?

Check these:

  1. Index is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Look for "Index Scan" not "Seq Scan"
  1. Query is optimized
  • Avoid SELECT * (fetch only needed columns)
  • Use LIMIT for large results
  • Check for N+1 queries
  1. Database resources
  • CPU usage <80%?
  • Memory usage <80%?
  • Disk I/O not saturated?
  1. Consider scaling
  • Read replicas for read-heavy workloads
  • Vertical scaling (more CPU/RAM)
  • Query result caching (Redis)

How do I prevent caching stale data?

Strategies:

1. Time-based invalidation (TTL)

redis.setex('key', 300, value); // Cache for 5 minutes

2. Event-based invalidation

async function updateProduct(id, data) {
  await db.product.update({ where: { id }, data });
  await redis.del(`product:${id}`); // Invalidate immediately
}

3. Cache versioning

const version = await redis.get('cache:version') || 1;
const key = `product:${id}:v${version}`;

// When data structure changes:
await redis.incr('cache:version'); // Invalidates all caches

4. Conditional requests (HTTP)

res.set('ETag', generateHash(data));
if (req.headers['if-none-match'] === etag) {
  return res.status(304).end(); // Not modified
}

Next Steps

  1. Add monitoring - Install APM (Datadog, New Relic, or Sentry)
  2. Identify bottlenecks - Find slowest 20% of endpoints
  3. Optimize database - Add indexes, fix N+1 queries
  4. Add caching - Redis for frequently accessed data
  5. Load test - Test with realistic traffic (1K+ concurrent users)
  6. Monitor production - Track P50/P95/P99 latency, error rates

Related guides:

Monitor critical API dependencies:

Check real-time API status at apistatuscheck.com - monitoring 160+ third-party APIs including AI platforms, cloud providers, payment gateways, and developer tools.

๐Ÿ›  Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

โ€œWe use SEMrush to track how our API status pages rank and catch site health issues early.โ€

From $129.95/moTry SEMrush Free
View full comparison & more tools โ†’Affiliate links โ€” we earn a commission at no extra cost to you

API Status Check

Stop checking API status pages manually

Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.

Start Free Trial โ†’

14-day free trial ยท $0 due today ยท $9/mo after ยท Cancel anytime

Browse Free Dashboard โ†’