API Performance Optimization: Complete Guide for Production Systems

by API Status Check Team

API Performance Optimization: Complete Guide for Production Systems

Slow APIs kill user experience, waste infrastructure costs, and limit scalability. A 100ms delay can reduce conversions by 7%. A 1-second delay can cost Amazon $1.6 billion in sales annually.

This guide covers production-ready performance optimization strategies used by Stripe, Shopify, and AWS to serve millions of requests per day with sub-100ms response times.

Why API Performance Matters

Impact on Business Metrics

User Experience & Conversions

  • 100ms delay → 7% drop in conversions (Amazon study)
  • 1-second delay → 11% fewer page views, 7% loss in conversions
  • 3-second load time → 53% of mobile users abandon

Infrastructure Costs

  • Slow APIs = more concurrent requests = higher server costs
  • Example: Reducing response time from 500ms to 100ms → 5x fewer concurrent connections
  • $10,000/month servers → $2,000/month with proper optimization

Scalability

  • Faster responses = more requests per server
  • Example: 100ms responses = 10 req/sec per connection vs 1 req/sec at 1000ms
  • 10x throughput with same infrastructure

Real-World Impact

Shopify Black Friday 2023

  • 1.8M requests/second peak
  • Sub-100ms P95 response times
  • Result: Zero downtime, $9.3B in sales

Stripe API

  • 99.99% uptime
  • P99 latency <200ms globally
  • Handles $640B annually

Netflix API

  • 1 billion API calls/day
  • P99 latency <100ms
  • Powers 230M subscribers globally

Key Performance Metrics

Essential Metrics to Track

Response Time (Latency)

P50 (median): 50% of requests faster than this
P95: 95% of requests faster than this
P99: 99% of requests faster than this (catches outliers)

Why P99 matters more than average:

  • Average: 100ms (looks good!)
  • P99: 5000ms (5% of users waiting 5 seconds = terrible UX)

Throughput

  • Requests per second (RPS) your API can handle
  • Example targets:
    • Small API: 100-1,000 RPS
    • Medium: 1,000-10,000 RPS
    • Large: 10,000+ RPS

Error Rate

  • 4xx errors: Client mistakes (not your fault)
  • 5xx errors: Server failures (your fault)
  • Target: <0.1% error rate under normal load

Saturation

  • CPU usage (target: <70% average, <90% peak)
  • Memory usage (target: <80%)
  • Database connections (target: <80% of pool)

Database Optimization

Database queries are the #1 performance bottleneck in most APIs.

Indexing Strategy

Before Indexing (Sequential Scan)

-- Query: Find user by email
SELECT * FROM users WHERE email = 'user@example.com';
-- Execution: 2,500ms (scans 1 million rows)

After Indexing

-- Create index
CREATE INDEX idx_users_email ON users(email);

-- Same query now: 8ms (index lookup)
-- 312x faster!

Composite Indexes for Multi-Column Queries

-- Query: Active orders for user in date range
SELECT * FROM orders 
WHERE user_id = 123 
  AND status = 'active' 
  AND created_at > '2026-01-01';

-- Index order matters!
CREATE INDEX idx_orders_user_status_date 
ON orders(user_id, status, created_at DESC);

-- user_id first (most selective filter)
-- status second (additional filter)
-- created_at last (for sorting)

Check Index Usage

-- PostgreSQL: Explain query plan
EXPLAIN ANALYZE 
SELECT * FROM orders WHERE user_id = 123;

-- Look for:
-- ✅ "Index Scan" or "Index Only Scan"
-- ❌ "Seq Scan" (sequential scan = missing index)

Query Optimization

N+1 Query Problem (Most Common Mistake)

// ❌ BAD: N+1 queries (1 + 100 = 101 database roundtrips)
async function getOrdersWithUsers() {
  const orders = await db.order.findMany(); // 1 query
  
  for (const order of orders) {
    // 100 separate queries!
    order.user = await db.user.findUnique({
      where: { id: order.userId }
    });
  }
  
  return orders;
}
// Response time: 2,500ms

// ✅ GOOD: Single query with join
async function getOrdersWithUsers() {
  const orders = await db.order.findMany({
    include: {
      user: true // Prisma automatically joins
    }
  });
  
  return orders;
}
// Response time: 45ms (55x faster!)

Select Only What You Need

// ❌ BAD: Fetching unnecessary data
const users = await db.user.findMany(); 
// Returns: id, email, password_hash, created_at, updated_at, profile, settings...
// Payload: 50KB per user × 100 users = 5MB

// ✅ GOOD: Select specific fields
const users = await db.user.findMany({
  select: {
    id: true,
    email: true,
    name: true
  }
});
// Payload: 2KB per user × 100 users = 200KB (25x smaller!)

Pagination for Large Datasets

// ❌ BAD: Loading all records
const allOrders = await db.order.findMany(); // 1 million rows = 500MB = 30 seconds
// OOM crash on large datasets

// ✅ GOOD: Cursor-based pagination
const orders = await db.order.findMany({
  take: 100,
  cursor: lastOrderId ? { id: lastOrderId } : undefined,
  orderBy: { created_at: 'desc' }
});
// Returns 100 rows in 12ms

Connection Pooling

Problem: Creating new database connections is slow

  • New connection: 50-200ms
  • Pooled connection: 0.5ms
  • 400x faster with connection pooling
import { PrismaClient } from '@prisma/client';

// ✅ Connection pool configuration
const prisma = new PrismaClient({
  datasources: {
    db: {
      url: process.env.DATABASE_URL
    }
  },
  // Connection pool settings
  pool: {
    min: 2,          // Minimum connections always open
    max: 10,         // Maximum concurrent connections
    acquireTimeoutMillis: 30000,
    idleTimeoutMillis: 30000
  }
});

// Connection lifecycle:
// 1. Request arrives → get connection from pool (0.5ms)
// 2. Execute query
// 3. Return connection to pool (reused by next request)

Pool Sizing Formula

Optimal pool size = (core_count × 2) + effective_spindle_count

Example for typical web server:
- 4 CPU cores
- SSD storage (spindle = 1)
- Pool size = (4 × 2) + 1 = 9 connections

Database Monitoring

Slow Query Logging

import { PrismaClient } from '@prisma/client';

const prisma = new PrismaClient({
  log: [
    {
      emit: 'event',
      level: 'query'
    }
  ]
});

prisma.$on('query', (e) => {
  if (e.duration > 1000) { // Queries slower than 1 second
    console.warn('Slow query detected:', {
      query: e.query,
      duration: `${e.duration}ms`,
      timestamp: e.timestamp
    });
  }
});

Key Metrics to Track

  • Query execution time (P50, P95, P99)
  • Connection pool utilization
  • Slow query count
  • Lock wait time
  • Cache hit ratio

Caching Strategies

Caching = storing computed results to avoid redundant work.

Cache Hierarchy (Fastest to Slowest)

  1. In-Memory Cache (0.1ms) - Node.js Map/LRU
  2. Redis Cache (1-2ms) - Shared across servers
  3. CDN Cache (10-50ms) - Global edge locations
  4. Database (20-200ms) - No cache
  5. External API (100-2000ms) - Third-party service

In-Memory Caching

import NodeCache from 'node-cache';

// Create cache with 5-minute TTL
const cache = new NodeCache({ stdTTL: 300 });

async function getUser(userId: string) {
  // Check cache first
  const cached = cache.get<User>(`user:${userId}`);
  if (cached) {
    console.log('Cache HIT');
    return cached; // 0.1ms response time
  }
  
  console.log('Cache MISS');
  // Fetch from database
  const user = await db.user.findUnique({ where: { id: userId } });
  
  // Store in cache
  cache.set(`user:${userId}`, user);
  
  return user; // 45ms first time, 0.1ms after
}

Use Cases for In-Memory Cache

  • User sessions
  • Configuration data
  • Frequently accessed lookup tables
  • API responses that rarely change

Limitations

  • Not shared across servers (each server has own cache)
  • Lost on server restart
  • Limited by RAM (max ~1-2GB typically)

Redis Caching (Production Standard)

import Redis from 'ioredis';

const redis = new Redis({
  host: 'localhost',
  port: 6379,
  maxRetriesPerRequest: 3
});

async function getProductWithCache(productId: string) {
  const cacheKey = `product:${productId}`;
  
  // Try cache first
  const cached = await redis.get(cacheKey);
  if (cached) {
    return JSON.parse(cached);
  }
  
  // Fetch from database
  const product = await db.product.findUnique({
    where: { id: productId },
    include: { reviews: true }
  });
  
  // Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(product));
  
  return product;
}

Cache Invalidation Strategies

Time-Based (TTL)

// Cache for 5 minutes
await redis.setex('key', 300, value);
// Pros: Simple, prevents stale data
// Cons: May serve outdated data for up to 5 minutes

Event-Based Invalidation

// Invalidate when data changes
async function updateProduct(id: string, data: ProductUpdate) {
  // Update database
  const product = await db.product.update({
    where: { id },
    data
  });
  
  // Invalidate cache immediately
  await redis.del(`product:${id}`);
  
  return product;
}
// Pros: Always fresh data
// Cons: Requires invalidation logic everywhere

Cache-Aside Pattern (Lazy Loading)

// 1. Check cache
// 2. If miss, fetch from DB
// 3. Store in cache
// 4. Return data

// Most common pattern for read-heavy workloads

HTTP Caching Headers

import express from 'express';

app.get('/api/products/:id', async (req, res) => {
  const product = await getProduct(req.params.id);
  
  // ✅ Cache in browser for 5 minutes
  res.set('Cache-Control', 'public, max-age=300');
  
  // ✅ ETag for conditional requests
  const etag = generateETag(product);
  res.set('ETag', etag);
  
  // If client's ETag matches, return 304 Not Modified
  if (req.headers['if-none-match'] === etag) {
    return res.status(304).end(); // No data transfer!
  }
  
  res.json(product);
});

Cache-Control Directives

public          - Can be cached by browsers + CDNs
private         - Only cached by browser (sensitive data)
no-cache        - Must revalidate with server before using
no-store        - Never cache (credit card data, etc.)
max-age=300     - Cache for 300 seconds (5 minutes)
s-maxage=3600   - CDN can cache for 1 hour

CDN Caching for Global Performance

// Example: Cloudflare caching
app.get('/api/public/products', async (req, res) => {
  const products = await getProducts();
  
  // ✅ Cache at CDN edge for 1 hour
  res.set('Cache-Control', 'public, s-maxage=3600');
  
  // ✅ Cloudflare-specific header
  res.set('CDN-Cache-Control', 'max-age=3600');
  
  res.json(products);
});

Performance Impact of CDN

  • Without CDN: 200ms (US East → Singapore)
  • With CDN: 15ms (cached at Singapore edge)
  • 13x faster for global users

Best CDN Providers for APIs

  • Cloudflare - Free tier, 250+ locations
  • Fastly - Real-time purge, low latency
  • AWS CloudFront - Tight AWS integration
  • Akamai - Enterprise-grade, massive scale

Code-Level Optimization

Async/Await vs Synchronous Operations

// ❌ BAD: Sequential operations (waterfall)
async function getUserData(userId: string) {
  const user = await db.user.findUnique({ where: { id: userId } });     // 50ms
  const orders = await db.order.findMany({ where: { userId } });         // 80ms
  const reviews = await db.review.findMany({ where: { userId } });       // 60ms
  
  return { user, orders, reviews };
}
// Total time: 50 + 80 + 60 = 190ms

// ✅ GOOD: Parallel operations
async function getUserData(userId: string) {
  const [user, orders, reviews] = await Promise.all([
    db.user.findUnique({ where: { id: userId } }),
    db.order.findMany({ where: { userId } }),
    db.review.findMany({ where: { userId } })
  ]);
  
  return { user, orders, reviews };
}
// Total time: max(50, 80, 60) = 80ms (2.4x faster!)

Request Batching (DataLoader Pattern)

import DataLoader from 'dataloader';

// ❌ PROBLEM: N+1 queries in GraphQL/nested requests
// Fetching 100 orders → 100 separate user queries

// ✅ SOLUTION: Batch requests into single query
const userLoader = new DataLoader(async (userIds: string[]) => {
  // Batch: Load all users in single query
  const users = await db.user.findMany({
    where: { id: { in: userIds } }
  });
  
  // Return in same order as requested IDs
  const userMap = new Map(users.map(u => [u.id, u]));
  return userIds.map(id => userMap.get(id));
});

// Usage
async function getOrders() {
  const orders = await db.order.findMany();
  
  // Automatically batches into single query!
  for (const order of orders) {
    order.user = await userLoader.load(order.userId);
  }
  
  return orders;
}

// Result:
// - 100 orders fetched: 1 query
// - 100 users fetched: 1 batched query (not 100!)
// - Total: 2 queries instead of 101 (50x fewer DB calls)

JSON Payload Optimization

// ❌ BAD: Returning entire objects with unnecessary fields
app.get('/api/users', async (req, res) => {
  const users = await db.user.findMany({
    include: {
      profile: true,
      settings: true,
      orders: {
        include: {
          items: true,
          shipping: true
        }
      }
    }
  });
  
  res.json(users);
});
// Payload: 500KB per user × 100 users = 50MB response!

// ✅ GOOD: Return only what's needed
app.get('/api/users', async (req, res) {
  const users = await db.user.findMany({
    select: {
      id: true,
      name: true,
      email: true,
      avatar: true
    }
  });
  
  res.json(users);
});
// Payload: 500 bytes per user × 100 users = 50KB (1000x smaller!)

Avoid Synchronous Operations in Request Path

// ❌ BAD: Blocking operations during request
app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // ❌ Blocks response while sending email (500ms)
  await sendOrderConfirmationEmail(order);
  
  // ❌ Blocks response while updating analytics (200ms)
  await updateAnalytics(order);
  
  res.json(order);
});
// Response time: 700ms + database time

// ✅ GOOD: Queue non-critical work
import Queue from 'bull';
const emailQueue = new Queue('emails', redisConfig);

app.post('/api/orders', async (req, res) => {
  const order = await createOrder(req.body);
  
  // ✅ Queue email (non-blocking, 1ms)
  emailQueue.add({ orderId: order.id });
  
  // ✅ Fire and forget analytics
  updateAnalytics(order).catch(console.error);
  
  res.json(order); // Returns immediately!
});
// Response time: database time only (95% faster)

Network Optimization

Response Compression

import compression from 'compression';
import express from 'express';

const app = express();

// ✅ Enable gzip compression
app.use(compression({
  level: 6,              // Compression level (1-9, 6 is balanced)
  threshold: 1024,       // Only compress responses > 1KB
  filter: (req, res) => {
    // Don't compress images/videos (already compressed)
    if (req.headers['x-no-compression']) {
      return false;
    }
    return compression.filter(req, res);
  }
}));

// Impact:
// - JSON response: 100KB → 15KB (85% smaller)
// - Transfer time: 200ms → 30ms over 3G
// - 6.6x faster download

HTTP/2 Server Push (Node.js)

import { createSecureServer } from 'http2';
import { readFileSync } from 'fs';

const server = createSecureServer({
  key: readFileSync('server.key'),
  cert: readFileSync('server.cert')
});

server.on('stream', (stream, headers) => {
  if (headers[':path'] === '/api/dashboard') {
    // ✅ Push critical resources before client requests them
    stream.pushStream({ ':path': '/api/user' }, (err, pushStream) => {
      pushStream.respond({ ':status': 200 });
      pushStream.end(JSON.stringify(userData));
    });
    
    stream.respond({ ':status': 200 });
    stream.end(JSON.stringify(dashboardData));
  }
});

// Impact:
// - HTTP/1.1: Request dashboard (100ms) → Request user (100ms) = 200ms
// - HTTP/2: Request dashboard + pushed user data = 100ms
// - 2x faster

Minimize Payload Size

Use Field Selection (GraphQL-style)

// Allow clients to specify which fields they want
app.get('/api/users', async (req, res) => {
  const fields = req.query.fields?.split(',') || ['id', 'name', 'email'];
  
  const select = fields.reduce((acc, field) => {
    acc[field] = true;
    return acc;
  }, {} as any);
  
  const users = await db.user.findMany({ select });
  res.json(users);
});

// Usage: /api/users?fields=id,name,email
// Returns only requested fields (smaller payload)

Remove Null Values

function removeNulls(obj: any): any {
  if (Array.isArray(obj)) {
    return obj.map(removeNulls).filter(v => v != null);
  }
  if (obj !== null && typeof obj === 'object') {
    return Object.entries(obj)
      .filter(([_, v]) => v != null)
      .reduce((acc, [k, v]) => ({ ...acc, [k]: removeNulls(v) }), {});
  }
  return obj;
}

// Before: { id: 1, name: "Alice", bio: null, avatar: null } = 50 bytes
// After:  { id: 1, name: "Alice" } = 24 bytes (52% smaller)

Infrastructure Optimization

Load Balancing

// Simple round-robin load balancer with health checks
import express from 'express';
import axios from 'axios';

const servers = [
  'http://server1:3000',
  'http://server2:3000',
  'http://server3:3000'
];

let currentIndex = 0;
const healthStatus = new Map<string, boolean>();

// Health check every 30 seconds
setInterval(async () => {
  for (const server of servers) {
    try {
      await axios.get(`${server}/health`, { timeout: 1000 });
      healthStatus.set(server, true);
    } catch {
      healthStatus.set(server, false);
      console.warn(`Server ${server} is DOWN`);
    }
  }
}, 30000);

// Proxy requests to healthy servers
app.use(async (req, res) => {
  const healthyServers = servers.filter(s => healthStatus.get(s) !== false);
  
  if (healthyServers.length === 0) {
    return res.status(503).json({ error: 'No healthy servers' });
  }
  
  // Round-robin selection
  const targetServer = healthyServers[currentIndex % healthyServers.length];
  currentIndex++;
  
  try {
    const response = await axios({
      method: req.method,
      url: `${targetServer}${req.path}`,
      data: req.body,
      headers: req.headers,
      timeout: 5000
    });
    
    res.status(response.status).json(response.data);
  } catch (error) {
    res.status(500).json({ error: 'Server error' });
  }
});

Production Load Balancers

  • NGINX - Industry standard, 100K+ req/sec
  • HAProxy - Layer 4/7 balancing, health checks
  • AWS ALB - Managed, auto-scaling
  • Cloudflare Load Balancing - Global, DDoS protection

Auto-Scaling

# Example: Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: api-autoscaler
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: api-server
  minReplicas: 2        # Always at least 2 pods
  maxReplicas: 10       # Scale up to 10 during traffic spikes
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70  # Scale when CPU > 70%
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80  # Scale when memory > 80%

Auto-Scaling Decisions

  • Scale UP: CPU > 70% for 2 minutes
  • Scale DOWN: CPU < 30% for 5 minutes
  • Result: Right-sized infrastructure = optimal cost

Database Read Replicas

import { PrismaClient } from '@prisma/client';

// Primary database (writes)
const prismaWrite = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_PRIMARY_URL }
  }
});

// Read replica (reads only)
const prismaRead = new PrismaClient({
  datasources: {
    db: { url: process.env.DATABASE_REPLICA_URL }
  }
});

// Write operations → primary
async function createOrder(data: OrderInput) {
  return prismaWrite.order.create({ data });
}

// Read operations → replica (reduces primary load)
async function getOrders(userId: string) {
  return prismaRead.order.findMany({
    where: { userId }
  });
}

// Impact:
// - Primary handles 100% writes + 0% reads
// - Replica handles 100% reads
// - 3 replicas = 75% load reduction on primary

Monitoring & Profiling

Application Performance Monitoring (APM)

import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';

Sentry.init({
  dsn: process.env.SENTRY_DSN,
  integrations: [
    new ProfilingIntegration()
  ],
  tracesSampleRate: 0.1, // Sample 10% of requests
  profilesSampleRate: 0.1
});

// Automatic performance tracking
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());

// Custom performance tracking
app.get('/api/expensive-operation', async (req, res) => {
  const transaction = Sentry.startTransaction({
    name: 'Expensive Operation',
    op: 'http.request'
  });
  
  const span1 = transaction.startChild({ op: 'database.query', description: 'Fetch users' });
  const users = await db.user.findMany();
  span1.finish();
  
  const span2 = transaction.startChild({ op: 'processing', description: 'Transform data' });
  const transformed = processUsers(users);
  span2.finish();
  
  transaction.finish();
  
  res.json(transformed);
});

// View in Sentry dashboard:
// - Total request time: 450ms
// - Database query: 250ms (55% of time)
// - Processing: 200ms (44% of time)
// → Optimize database query first!

Best APM Tools

  • Datadog APM - Full-stack observability
  • New Relic - Real user monitoring
  • Sentry - Error + performance tracking
  • AWS X-Ray - Distributed tracing for AWS

Performance Benchmarking

import autocannon from 'autocannon';

// Load test your API
async function benchmarkAPI() {
  const result = await autocannon({
    url: 'http://localhost:3000/api/users',
    connections: 100,    // 100 concurrent connections
    duration: 30,        // 30 seconds
    pipelining: 1
  });
  
  console.log('Performance Results:');
  console.log(`Requests/sec: ${result.requests.average}`);
  console.log(`Latency P50: ${result.latency.p50}ms`);
  console.log(`Latency P95: ${result.latency.p95}ms`);
  console.log(`Latency P99: ${result.latency.p99}ms`);
  console.log(`Error rate: ${(result.non2xx / result.requests.total) * 100}%`);
}

// Example output:
// Requests/sec: 5,240
// Latency P50: 15ms
// Latency P95: 48ms
// Latency P99: 120ms
// Error rate: 0.02%

Load Testing Tools

  • autocannon (Node.js) - 40K+ req/sec benchmarking
  • k6 - Modern load testing, Grafana integration
  • Apache JMeter - Enterprise-grade, GUI
  • wrk - Minimal, blazing fast

Real-Time Performance Monitoring

import prometheus from 'prom-client';

// Create metrics
const httpRequestDuration = new prometheus.Histogram({
  name: 'http_request_duration_seconds',
  help: 'Duration of HTTP requests in seconds',
  labelNames: ['method', 'route', 'status_code'],
  buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] // Response time buckets
});

const httpRequestsTotal = new prometheus.Counter({
  name: 'http_requests_total',
  help: 'Total number of HTTP requests',
  labelNames: ['method', 'route', 'status_code']
});

// Middleware to track metrics
app.use((req, res, next) => {
  const start = Date.now();
  
  res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    
    httpRequestDuration.observe({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    }, duration);
    
    httpRequestsTotal.inc({
      method: req.method,
      route: req.route?.path || req.path,
      status_code: res.statusCode
    });
  });
  
  next();
});

// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
  res.set('Content-Type', prometheus.register.contentType);
  res.end(await prometheus.register.metrics());
});

// Visualize in Grafana:
// - Request rate over time
// - P50/P95/P99 latency graphs
// - Error rate trends
// - Slow endpoint identification

Real-World Examples

Example 1: E-Commerce Product API

Before Optimization

app.get('/api/products/:id', async (req, res) => {
  const product = await db.product.findUnique({
    where: { id: req.params.id },
    include: {
      reviews: true,      // 1,000+ reviews
      variants: true,     // 50 variants
      images: true,       // 10 images
      relatedProducts: {
        include: {
          reviews: true,  // Another 1,000+ reviews per product
          images: true
        }
      }
    }
  });
  
  res.json(product);
});
// Response time: 3,200ms
// Payload: 850KB

After Optimization

import NodeCache from 'node-cache';
const cache = new NodeCache({ stdTTL: 300 });

app.get('/api/products/:id', async (req, res) => {
  const productId = req.params.id;
  
  // 1. Check cache
  const cached = cache.get(productId);
  if (cached) {
    res.set('X-Cache', 'HIT');
    return res.json(cached);
  }
  
  // 2. Optimized query
  const product = await db.product.findUnique({
    where: { id: productId },
    select: {
      id: true,
      name: true,
      price: true,
      description: true,
      images: {
        take: 3,  // Only first 3 images
        select: { url: true, alt: true }
      },
      reviews: {
        take: 5,  // Only latest 5 reviews
        orderBy: { created_at: 'desc' },
        select: { rating: true, comment: true, author: true }
      }
    }
  });
  
  // 3. Aggregate stats instead of loading all data
  const [avgRating, reviewCount] = await db.$queryRaw`
    SELECT 
      AVG(rating) as avg_rating,
      COUNT(*) as review_count
    FROM reviews 
    WHERE product_id = ${productId}
  `;
  
  const result = {
    ...product,
    avgRating,
    reviewCount
  };
  
  // 4. Cache result
  cache.set(productId, result);
  
  // 5. HTTP caching
  res.set('Cache-Control', 'public, max-age=300');
  res.set('X-Cache', 'MISS');
  
  res.json(result);
});

// Response time: 45ms (71x faster!)
// Payload: 12KB (70x smaller!)

Example 2: Analytics Dashboard API

Before Optimization

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  
  // Sequential queries (waterfall)
  const pageViews = await db.pageView.count({ where: { userId } });
  const uniqueVisitors = await db.visitor.count({ where: { userId } });
  const revenue = await db.order.aggregate({
    where: { userId },
    _sum: { total: true }
  });
  const topPages = await db.pageView.groupBy({
    by: ['page'],
    where: { userId },
    _count: true,
    orderBy: { _count: { page: 'desc' } },
    take: 10
  });
  
  res.json({ pageViews, uniqueVisitors, revenue, topPages });
});
// Response time: 2,800ms

After Optimization

import Redis from 'ioredis';
const redis = new Redis();

app.get('/api/analytics/dashboard', async (req, res) => {
  const userId = req.user.id;
  const cacheKey = `analytics:${userId}:${new Date().toISOString().split('T')[0]}`;
  
  // 1. Check Redis cache (daily cache)
  const cached = await redis.get(cacheKey);
  if (cached) {
    return res.json(JSON.parse(cached));
  }
  
  // 2. Parallel queries
  const [pageViews, uniqueVisitors, revenue, topPages] = await Promise.all([
    db.pageView.count({ where: { userId } }),
    db.visitor.count({ where: { userId } }),
    db.order.aggregate({
      where: { userId },
      _sum: { total: true }
    }),
    db.pageView.groupBy({
      by: ['page'],
      where: { userId },
      _count: true,
      orderBy: { _count: { page: 'desc' } },
      take: 10
    })
  ]);
  
  const result = { pageViews, uniqueVisitors, revenue, topPages };
  
  // 3. Cache for 1 hour
  await redis.setex(cacheKey, 3600, JSON.stringify(result));
  
  res.json(result);
});

// Response time: 180ms first load, 2ms cached (1555x faster!)

Example 3: Search API

Before Optimization

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Full-text search across all fields
  const results = await db.product.findMany({
    where: {
      OR: [
        { name: { contains: query } },
        { description: { contains: query } },
        { category: { contains: query } },
        { tags: { has: query } }
      ]
    }
  });
  
  res.json(results);
});
// Response time: 4,500ms for 100,000+ products
// No relevance ranking

After Optimization with Elasticsearch

import { Client } from '@elastic/elasticsearch';
const elastic = new Client({ node: 'http://localhost:9200' });

app.get('/api/search', async (req, res) => {
  const query = req.query.q;
  
  // Elasticsearch full-text search with relevance ranking
  const { hits } = await elastic.search({
    index: 'products',
    body: {
      query: {
        multi_match: {
          query,
          fields: ['name^3', 'description', 'category^2', 'tags'],
          fuzziness: 'AUTO' // Handle typos
        }
      },
      size: 20,
      from: (req.query.page || 0) * 20,
      highlight: {
        fields: {
          name: {},
          description: {}
        }
      }
    }
  });
  
  const results = hits.hits.map(hit => ({
    ...hit._source,
    score: hit._score,
    highlights: hit.highlight
  }));
  
  res.json(results);
});

// Response time: 25ms (180x faster!)
// Relevance ranking + typo tolerance + highlighting

Common Mistakes to Avoid

1. Not Using Database Indexes

Symptom: Queries that take seconds on tables with 100K+ rows

Fix: Add indexes on columns used in WHERE, ORDER BY, JOIN

CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);

2. Fetching All Data When You Need Aggregates

Mistake: Loading 1M records to count them

const users = await db.user.findMany(); // Loads 1M records = 500MB
const count = users.length; // 😱

Fix: Use database aggregation

const count = await db.user.count(); // Returns number only

3. Not Implementing Pagination

Mistake: Returning unlimited results

const products = await db.product.findMany(); // Returns 100,000 products = 50MB

Fix: Always paginate

const products = await db.product.findMany({
  take: 50,
  skip: (page - 1) * 50
});

4. Caching Everything Forever

Mistake: No cache invalidation strategy

cache.set('key', value); // Cached forever, even when data changes

Fix: Use appropriate TTLs

cache.set('key', value, 300); // 5 minutes for frequently changing data
cache.set('config', configValue, 3600); // 1 hour for rarely changing data

5. Synchronous Operations in Request Path

Mistake: Blocking response for non-critical tasks

await sendEmail(); // Blocks response for 500ms
await updateAnalytics(); // Blocks another 200ms
res.json(result); // User waits 700ms unnecessarily

Fix: Queue background jobs

emailQueue.add({ orderId }); // Returns in 1ms
res.json(result); // User gets response immediately

6. Not Monitoring Performance

Mistake: No visibility into slow endpoints

Fix: Add APM + custom metrics

// Track every endpoint's performance
app.use((req, res, next) => {
  const start = Date.now();
  res.on('finish', () => {
    const duration = Date.now() - start;
    metrics.recordLatency(req.route?.path, duration);
  });
  next();
});

7. Over-Optimization Too Early

Mistake: Complex caching before measuring actual bottlenecks

Fix:

  1. Measure first (add metrics)
  2. Identify bottleneck (slowest 20% of endpoints)
  3. Optimize only the bottleneck
  4. Measure again
  5. Repeat

8. Not Using Connection Pooling

Mistake: Creating new database connections per request

const db = new Database(); // New connection every request = 100ms overhead

Fix: Use connection pool

const pool = new Pool({ max: 10 }); // Reuse connections = 0.5ms

9. Exposing Internal Implementation Details

Mistake: Returning raw database objects

res.json(user); // Includes password_hash, internal_id, etc.

Fix: Use DTOs (Data Transfer Objects)

res.json({
  id: user.id,
  name: user.name,
  email: user.email
  // Only public fields
});

10. Not Testing with Production Data Volumes

Mistake: Testing with 100 records when production has 10M

Fix: Load test with realistic data

// Seed 1M+ test records
// Run load tests: 1K concurrent users
// Identify bottlenecks before launch

Production Checklist

Database

  • Indexes on all WHERE, ORDER BY, JOIN columns
  • Connection pooling configured (max 10-20 connections)
  • Slow query logging enabled (>1s queries)
  • Read replicas for read-heavy workloads
  • Query result caching (Redis)
  • Regular ANALYZE/VACUUM (PostgreSQL)

Caching

  • Redis cache for frequently accessed data
  • HTTP caching headers (Cache-Control, ETag)
  • CDN for static assets + public API responses
  • Cache invalidation strategy documented
  • Cache hit/miss metrics tracked

Code

  • Parallel operations with Promise.all()
  • No N+1 queries (use joins/includes)
  • Background jobs for non-critical tasks
  • Pagination on all list endpoints
  • Field selection (don't return unnecessary data)

Network

  • Response compression (gzip/brotli)
  • HTTP/2 enabled
  • Payload size optimized (<100KB typical response)
  • CDN for global distribution

Infrastructure

  • Load balancer with health checks
  • Auto-scaling configured (2-10 instances)
  • Database read replicas (1-3 replicas)
  • CDN enabled (Cloudflare or similar)

Monitoring

  • APM installed (Datadog, New Relic, or Sentry)
  • Custom metrics (request rate, latency, errors)
  • Slow query alerts (>1s queries)
  • Error rate alerts (>1% errors)
  • Latency alerts (P95 >500ms)
  • Load testing performed (1K+ concurrent users)

Performance Targets

  • P50 latency <100ms
  • P95 latency <500ms
  • P99 latency <2000ms
  • Error rate <0.1%
  • Throughput: 1000+ RPS per server
  • Cache hit rate >70%

Tools & Resources

Performance Monitoring

  • Datadog APM - Full-stack observability
  • New Relic - Application performance monitoring
  • Sentry - Error + performance tracking
  • Grafana + Prometheus - Open-source monitoring stack

Load Testing

  • autocannon - Fast HTTP/1.1 benchmarking
  • k6 - Modern load testing
  • Apache JMeter - Enterprise load testing
  • Gatling - Scala-based load testing

Database Tools

  • pgAdmin - PostgreSQL management
  • DataGrip - Universal database IDE
  • PgHero - PostgreSQL performance dashboard

CDN Providers

  • Cloudflare - Free tier, 250+ locations
  • Fastly - Real-time purge
  • AWS CloudFront - AWS integration
  • Akamai - Enterprise CDN

Caching

  • Redis - In-memory data store
  • Memcached - Simple key-value cache
  • Varnish - HTTP accelerator

Frequently Asked Questions

How much performance improvement can I expect?

Typical gains from this guide:

  • Database optimization: 10-100x faster queries
  • Caching: 50-1000x faster repeated requests
  • Code optimization: 2-10x faster processing
  • Infrastructure: 2-5x more throughput

Real example (e-commerce API):

  • Before: 3,200ms response time
  • After: 45ms response time
  • 71x improvement

Should I optimize everything at once?

No. Follow this order:

  1. Measure - Add APM to identify bottlenecks
  2. Database - Usually 80% of performance issues
  3. Caching - Redis + HTTP caching
  4. Code - Parallel operations, background jobs
  5. Infrastructure - Load balancing, CDN

Optimize the slowest 20% first (Pareto principle).

When should I add caching?

Add caching when:

  • Same data requested frequently (>10 times/minute)
  • Data doesn't change often (every 5+ minutes)
  • Database queries are slow (>100ms)
  • Traffic is growing (>1000 requests/hour)

Don't cache when:

  • Data changes constantly (real-time stock prices)
  • Each request is unique (user-specific content)
  • Data is already fast (<10ms queries)

How many database connections should I use?

Formula: (CPU cores × 2) + storage I/O

Examples:

  • 4-core server with SSD: (4 × 2) + 1 = 9 connections
  • 8-core server with HDD: (8 × 2) + 4 = 20 connections

Too many connections = contention, slower queries Too few connections = request queuing, timeouts

What's the difference between caching and CDN?

Caching (Redis, in-memory):

  • Stores computed results (database queries, API responses)
  • Server-side (your infrastructure)
  • Invalidate when data changes

CDN (Cloudflare, Fastly):

  • Stores static files + API responses
  • Edge locations worldwide (close to users)
  • Reduces latency for global users

Use both: Redis for dynamic data, CDN for global distribution

Should I use GraphQL or REST for performance?

GraphQL advantages:

  • Client specifies exact fields needed (smaller payloads)
  • Single request for multiple resources (no multiple round-trips)

GraphQL challenges:

  • N+1 query problem (requires DataLoader)
  • Caching harder (no URL-based cache keys)

REST advantages:

  • Simpler caching (URL-based)
  • Better CDN support

Verdict: Both can be fast with proper optimization. Use REST for public APIs, GraphQL for complex client needs.

How do I optimize API calls to third-party services?

Strategies:

  1. Cache responses aggressively
const cachedResponse = await redis.get(`stripe:customer:${id}`);
if (cachedResponse) return JSON.parse(cachedResponse);

const customer = await stripe.customers.retrieve(id);
await redis.setex(`stripe:customer:${id}`, 3600, JSON.stringify(customer));
  1. Batch requests when possible
// Bad: 100 API calls
for (const id of customerIds) {
  await stripe.customers.retrieve(id);
}

// Good: 1 API call
const customers = await stripe.customers.list({
  limit: 100,
  starting_after: lastId
});
  1. Webhooks instead of polling
  • Stripe sends webhooks when data changes
  • No need to poll for updates every minute
  1. Monitor third-party API status
  • Use API Status Check to track outages
  • Implement circuit breakers for failing APIs
  • Have fallback strategies

What if my database is still slow after indexing?

Check these:

  1. Index is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Look for "Index Scan" not "Seq Scan"
  1. Query is optimized
  • Avoid SELECT * (fetch only needed columns)
  • Use LIMIT for large results
  • Check for N+1 queries
  1. Database resources
  • CPU usage <80%?
  • Memory usage <80%?
  • Disk I/O not saturated?
  1. Consider scaling
  • Read replicas for read-heavy workloads
  • Vertical scaling (more CPU/RAM)
  • Query result caching (Redis)

How do I prevent caching stale data?

Strategies:

1. Time-based invalidation (TTL)

redis.setex('key', 300, value); // Cache for 5 minutes

2. Event-based invalidation

async function updateProduct(id, data) {
  await db.product.update({ where: { id }, data });
  await redis.del(`product:${id}`); // Invalidate immediately
}

3. Cache versioning

const version = await redis.get('cache:version') || 1;
const key = `product:${id}:v${version}`;

// When data structure changes:
await redis.incr('cache:version'); // Invalidates all caches

4. Conditional requests (HTTP)

res.set('ETag', generateHash(data));
if (req.headers['if-none-match'] === etag) {
  return res.status(304).end(); // Not modified
}

Next Steps

  1. Add monitoring - Install APM (Datadog, New Relic, or Sentry)
  2. Identify bottlenecks - Find slowest 20% of endpoints
  3. Optimize database - Add indexes, fix N+1 queries
  4. Add caching - Redis for frequently accessed data
  5. Load test - Test with realistic traffic (1K+ concurrent users)
  6. Monitor production - Track P50/P95/P99 latency, error rates

Related guides:

Monitor critical API dependencies:

Check real-time API status at apistatuscheck.com - monitoring 160+ third-party APIs including AI platforms, cloud providers, payment gateways, and developer tools.

API Status Check

Stop checking API status pages manually

Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.

Get Alerts — $9/mo →

Free dashboard available · 14-day trial on paid plans · Cancel anytime

Browse Free Dashboard →