API Performance Optimization: Complete Guide for Production Systems
API Performance Optimization: Complete Guide for Production Systems
Slow APIs kill user experience, waste infrastructure costs, and limit scalability. A 100ms delay can reduce conversions by 7%. A 1-second delay can cost Amazon $1.6 billion in sales annually.
This guide covers production-ready performance optimization strategies used by Stripe, Shopify, and AWS to serve millions of requests per day with sub-100ms response times.
Why API Performance Matters
Impact on Business Metrics
User Experience & Conversions
- 100ms delay → 7% drop in conversions (Amazon study)
- 1-second delay → 11% fewer page views, 7% loss in conversions
- 3-second load time → 53% of mobile users abandon
Infrastructure Costs
- Slow APIs = more concurrent requests = higher server costs
- Example: Reducing response time from 500ms to 100ms → 5x fewer concurrent connections
- $10,000/month servers → $2,000/month with proper optimization
Scalability
- Faster responses = more requests per server
- Example: 100ms responses = 10 req/sec per connection vs 1 req/sec at 1000ms
- 10x throughput with same infrastructure
Real-World Impact
Shopify Black Friday 2023
- 1.8M requests/second peak
- Sub-100ms P95 response times
- Result: Zero downtime, $9.3B in sales
Stripe API
- 99.99% uptime
- P99 latency <200ms globally
- Handles $640B annually
Netflix API
- 1 billion API calls/day
- P99 latency <100ms
- Powers 230M subscribers globally
Key Performance Metrics
Essential Metrics to Track
Response Time (Latency)
P50 (median): 50% of requests faster than this
P95: 95% of requests faster than this
P99: 99% of requests faster than this (catches outliers)
Why P99 matters more than average:
- Average: 100ms (looks good!)
- P99: 5000ms (5% of users waiting 5 seconds = terrible UX)
Throughput
- Requests per second (RPS) your API can handle
- Example targets:
- Small API: 100-1,000 RPS
- Medium: 1,000-10,000 RPS
- Large: 10,000+ RPS
Error Rate
- 4xx errors: Client mistakes (not your fault)
- 5xx errors: Server failures (your fault)
- Target: <0.1% error rate under normal load
Saturation
- CPU usage (target: <70% average, <90% peak)
- Memory usage (target: <80%)
- Database connections (target: <80% of pool)
Database Optimization
Database queries are the #1 performance bottleneck in most APIs.
Indexing Strategy
Before Indexing (Sequential Scan)
-- Query: Find user by email
SELECT * FROM users WHERE email = 'user@example.com';
-- Execution: 2,500ms (scans 1 million rows)
After Indexing
-- Create index
CREATE INDEX idx_users_email ON users(email);
-- Same query now: 8ms (index lookup)
-- 312x faster!
Composite Indexes for Multi-Column Queries
-- Query: Active orders for user in date range
SELECT * FROM orders
WHERE user_id = 123
AND status = 'active'
AND created_at > '2026-01-01';
-- Index order matters!
CREATE INDEX idx_orders_user_status_date
ON orders(user_id, status, created_at DESC);
-- user_id first (most selective filter)
-- status second (additional filter)
-- created_at last (for sorting)
Check Index Usage
-- PostgreSQL: Explain query plan
EXPLAIN ANALYZE
SELECT * FROM orders WHERE user_id = 123;
-- Look for:
-- ✅ "Index Scan" or "Index Only Scan"
-- ❌ "Seq Scan" (sequential scan = missing index)
Query Optimization
N+1 Query Problem (Most Common Mistake)
// ❌ BAD: N+1 queries (1 + 100 = 101 database roundtrips)
async function getOrdersWithUsers() {
const orders = await db.order.findMany(); // 1 query
for (const order of orders) {
// 100 separate queries!
order.user = await db.user.findUnique({
where: { id: order.userId }
});
}
return orders;
}
// Response time: 2,500ms
// ✅ GOOD: Single query with join
async function getOrdersWithUsers() {
const orders = await db.order.findMany({
include: {
user: true // Prisma automatically joins
}
});
return orders;
}
// Response time: 45ms (55x faster!)
Select Only What You Need
// ❌ BAD: Fetching unnecessary data
const users = await db.user.findMany();
// Returns: id, email, password_hash, created_at, updated_at, profile, settings...
// Payload: 50KB per user × 100 users = 5MB
// ✅ GOOD: Select specific fields
const users = await db.user.findMany({
select: {
id: true,
email: true,
name: true
}
});
// Payload: 2KB per user × 100 users = 200KB (25x smaller!)
Pagination for Large Datasets
// ❌ BAD: Loading all records
const allOrders = await db.order.findMany(); // 1 million rows = 500MB = 30 seconds
// OOM crash on large datasets
// ✅ GOOD: Cursor-based pagination
const orders = await db.order.findMany({
take: 100,
cursor: lastOrderId ? { id: lastOrderId } : undefined,
orderBy: { created_at: 'desc' }
});
// Returns 100 rows in 12ms
Connection Pooling
Problem: Creating new database connections is slow
- New connection: 50-200ms
- Pooled connection: 0.5ms
- 400x faster with connection pooling
import { PrismaClient } from '@prisma/client';
// ✅ Connection pool configuration
const prisma = new PrismaClient({
datasources: {
db: {
url: process.env.DATABASE_URL
}
},
// Connection pool settings
pool: {
min: 2, // Minimum connections always open
max: 10, // Maximum concurrent connections
acquireTimeoutMillis: 30000,
idleTimeoutMillis: 30000
}
});
// Connection lifecycle:
// 1. Request arrives → get connection from pool (0.5ms)
// 2. Execute query
// 3. Return connection to pool (reused by next request)
Pool Sizing Formula
Optimal pool size = (core_count × 2) + effective_spindle_count
Example for typical web server:
- 4 CPU cores
- SSD storage (spindle = 1)
- Pool size = (4 × 2) + 1 = 9 connections
Database Monitoring
Slow Query Logging
import { PrismaClient } from '@prisma/client';
const prisma = new PrismaClient({
log: [
{
emit: 'event',
level: 'query'
}
]
});
prisma.$on('query', (e) => {
if (e.duration > 1000) { // Queries slower than 1 second
console.warn('Slow query detected:', {
query: e.query,
duration: `${e.duration}ms`,
timestamp: e.timestamp
});
}
});
Key Metrics to Track
- Query execution time (P50, P95, P99)
- Connection pool utilization
- Slow query count
- Lock wait time
- Cache hit ratio
Caching Strategies
Caching = storing computed results to avoid redundant work.
Cache Hierarchy (Fastest to Slowest)
- In-Memory Cache (0.1ms) - Node.js Map/LRU
- Redis Cache (1-2ms) - Shared across servers
- CDN Cache (10-50ms) - Global edge locations
- Database (20-200ms) - No cache
- External API (100-2000ms) - Third-party service
In-Memory Caching
import NodeCache from 'node-cache';
// Create cache with 5-minute TTL
const cache = new NodeCache({ stdTTL: 300 });
async function getUser(userId: string) {
// Check cache first
const cached = cache.get<User>(`user:${userId}`);
if (cached) {
console.log('Cache HIT');
return cached; // 0.1ms response time
}
console.log('Cache MISS');
// Fetch from database
const user = await db.user.findUnique({ where: { id: userId } });
// Store in cache
cache.set(`user:${userId}`, user);
return user; // 45ms first time, 0.1ms after
}
Use Cases for In-Memory Cache
- User sessions
- Configuration data
- Frequently accessed lookup tables
- API responses that rarely change
Limitations
- Not shared across servers (each server has own cache)
- Lost on server restart
- Limited by RAM (max ~1-2GB typically)
Redis Caching (Production Standard)
import Redis from 'ioredis';
const redis = new Redis({
host: 'localhost',
port: 6379,
maxRetriesPerRequest: 3
});
async function getProductWithCache(productId: string) {
const cacheKey = `product:${productId}`;
// Try cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Fetch from database
const product = await db.product.findUnique({
where: { id: productId },
include: { reviews: true }
});
// Cache for 1 hour
await redis.setex(cacheKey, 3600, JSON.stringify(product));
return product;
}
Cache Invalidation Strategies
Time-Based (TTL)
// Cache for 5 minutes
await redis.setex('key', 300, value);
// Pros: Simple, prevents stale data
// Cons: May serve outdated data for up to 5 minutes
Event-Based Invalidation
// Invalidate when data changes
async function updateProduct(id: string, data: ProductUpdate) {
// Update database
const product = await db.product.update({
where: { id },
data
});
// Invalidate cache immediately
await redis.del(`product:${id}`);
return product;
}
// Pros: Always fresh data
// Cons: Requires invalidation logic everywhere
Cache-Aside Pattern (Lazy Loading)
// 1. Check cache
// 2. If miss, fetch from DB
// 3. Store in cache
// 4. Return data
// Most common pattern for read-heavy workloads
HTTP Caching Headers
import express from 'express';
app.get('/api/products/:id', async (req, res) => {
const product = await getProduct(req.params.id);
// ✅ Cache in browser for 5 minutes
res.set('Cache-Control', 'public, max-age=300');
// ✅ ETag for conditional requests
const etag = generateETag(product);
res.set('ETag', etag);
// If client's ETag matches, return 304 Not Modified
if (req.headers['if-none-match'] === etag) {
return res.status(304).end(); // No data transfer!
}
res.json(product);
});
Cache-Control Directives
public - Can be cached by browsers + CDNs
private - Only cached by browser (sensitive data)
no-cache - Must revalidate with server before using
no-store - Never cache (credit card data, etc.)
max-age=300 - Cache for 300 seconds (5 minutes)
s-maxage=3600 - CDN can cache for 1 hour
CDN Caching for Global Performance
// Example: Cloudflare caching
app.get('/api/public/products', async (req, res) => {
const products = await getProducts();
// ✅ Cache at CDN edge for 1 hour
res.set('Cache-Control', 'public, s-maxage=3600');
// ✅ Cloudflare-specific header
res.set('CDN-Cache-Control', 'max-age=3600');
res.json(products);
});
Performance Impact of CDN
- Without CDN: 200ms (US East → Singapore)
- With CDN: 15ms (cached at Singapore edge)
- 13x faster for global users
Best CDN Providers for APIs
- Cloudflare - Free tier, 250+ locations
- Fastly - Real-time purge, low latency
- AWS CloudFront - Tight AWS integration
- Akamai - Enterprise-grade, massive scale
Code-Level Optimization
Async/Await vs Synchronous Operations
// ❌ BAD: Sequential operations (waterfall)
async function getUserData(userId: string) {
const user = await db.user.findUnique({ where: { id: userId } }); // 50ms
const orders = await db.order.findMany({ where: { userId } }); // 80ms
const reviews = await db.review.findMany({ where: { userId } }); // 60ms
return { user, orders, reviews };
}
// Total time: 50 + 80 + 60 = 190ms
// ✅ GOOD: Parallel operations
async function getUserData(userId: string) {
const [user, orders, reviews] = await Promise.all([
db.user.findUnique({ where: { id: userId } }),
db.order.findMany({ where: { userId } }),
db.review.findMany({ where: { userId } })
]);
return { user, orders, reviews };
}
// Total time: max(50, 80, 60) = 80ms (2.4x faster!)
Request Batching (DataLoader Pattern)
import DataLoader from 'dataloader';
// ❌ PROBLEM: N+1 queries in GraphQL/nested requests
// Fetching 100 orders → 100 separate user queries
// ✅ SOLUTION: Batch requests into single query
const userLoader = new DataLoader(async (userIds: string[]) => {
// Batch: Load all users in single query
const users = await db.user.findMany({
where: { id: { in: userIds } }
});
// Return in same order as requested IDs
const userMap = new Map(users.map(u => [u.id, u]));
return userIds.map(id => userMap.get(id));
});
// Usage
async function getOrders() {
const orders = await db.order.findMany();
// Automatically batches into single query!
for (const order of orders) {
order.user = await userLoader.load(order.userId);
}
return orders;
}
// Result:
// - 100 orders fetched: 1 query
// - 100 users fetched: 1 batched query (not 100!)
// - Total: 2 queries instead of 101 (50x fewer DB calls)
JSON Payload Optimization
// ❌ BAD: Returning entire objects with unnecessary fields
app.get('/api/users', async (req, res) => {
const users = await db.user.findMany({
include: {
profile: true,
settings: true,
orders: {
include: {
items: true,
shipping: true
}
}
}
});
res.json(users);
});
// Payload: 500KB per user × 100 users = 50MB response!
// ✅ GOOD: Return only what's needed
app.get('/api/users', async (req, res) {
const users = await db.user.findMany({
select: {
id: true,
name: true,
email: true,
avatar: true
}
});
res.json(users);
});
// Payload: 500 bytes per user × 100 users = 50KB (1000x smaller!)
Avoid Synchronous Operations in Request Path
// ❌ BAD: Blocking operations during request
app.post('/api/orders', async (req, res) => {
const order = await createOrder(req.body);
// ❌ Blocks response while sending email (500ms)
await sendOrderConfirmationEmail(order);
// ❌ Blocks response while updating analytics (200ms)
await updateAnalytics(order);
res.json(order);
});
// Response time: 700ms + database time
// ✅ GOOD: Queue non-critical work
import Queue from 'bull';
const emailQueue = new Queue('emails', redisConfig);
app.post('/api/orders', async (req, res) => {
const order = await createOrder(req.body);
// ✅ Queue email (non-blocking, 1ms)
emailQueue.add({ orderId: order.id });
// ✅ Fire and forget analytics
updateAnalytics(order).catch(console.error);
res.json(order); // Returns immediately!
});
// Response time: database time only (95% faster)
Network Optimization
Response Compression
import compression from 'compression';
import express from 'express';
const app = express();
// ✅ Enable gzip compression
app.use(compression({
level: 6, // Compression level (1-9, 6 is balanced)
threshold: 1024, // Only compress responses > 1KB
filter: (req, res) => {
// Don't compress images/videos (already compressed)
if (req.headers['x-no-compression']) {
return false;
}
return compression.filter(req, res);
}
}));
// Impact:
// - JSON response: 100KB → 15KB (85% smaller)
// - Transfer time: 200ms → 30ms over 3G
// - 6.6x faster download
HTTP/2 Server Push (Node.js)
import { createSecureServer } from 'http2';
import { readFileSync } from 'fs';
const server = createSecureServer({
key: readFileSync('server.key'),
cert: readFileSync('server.cert')
});
server.on('stream', (stream, headers) => {
if (headers[':path'] === '/api/dashboard') {
// ✅ Push critical resources before client requests them
stream.pushStream({ ':path': '/api/user' }, (err, pushStream) => {
pushStream.respond({ ':status': 200 });
pushStream.end(JSON.stringify(userData));
});
stream.respond({ ':status': 200 });
stream.end(JSON.stringify(dashboardData));
}
});
// Impact:
// - HTTP/1.1: Request dashboard (100ms) → Request user (100ms) = 200ms
// - HTTP/2: Request dashboard + pushed user data = 100ms
// - 2x faster
Minimize Payload Size
Use Field Selection (GraphQL-style)
// Allow clients to specify which fields they want
app.get('/api/users', async (req, res) => {
const fields = req.query.fields?.split(',') || ['id', 'name', 'email'];
const select = fields.reduce((acc, field) => {
acc[field] = true;
return acc;
}, {} as any);
const users = await db.user.findMany({ select });
res.json(users);
});
// Usage: /api/users?fields=id,name,email
// Returns only requested fields (smaller payload)
Remove Null Values
function removeNulls(obj: any): any {
if (Array.isArray(obj)) {
return obj.map(removeNulls).filter(v => v != null);
}
if (obj !== null && typeof obj === 'object') {
return Object.entries(obj)
.filter(([_, v]) => v != null)
.reduce((acc, [k, v]) => ({ ...acc, [k]: removeNulls(v) }), {});
}
return obj;
}
// Before: { id: 1, name: "Alice", bio: null, avatar: null } = 50 bytes
// After: { id: 1, name: "Alice" } = 24 bytes (52% smaller)
Infrastructure Optimization
Load Balancing
// Simple round-robin load balancer with health checks
import express from 'express';
import axios from 'axios';
const servers = [
'http://server1:3000',
'http://server2:3000',
'http://server3:3000'
];
let currentIndex = 0;
const healthStatus = new Map<string, boolean>();
// Health check every 30 seconds
setInterval(async () => {
for (const server of servers) {
try {
await axios.get(`${server}/health`, { timeout: 1000 });
healthStatus.set(server, true);
} catch {
healthStatus.set(server, false);
console.warn(`Server ${server} is DOWN`);
}
}
}, 30000);
// Proxy requests to healthy servers
app.use(async (req, res) => {
const healthyServers = servers.filter(s => healthStatus.get(s) !== false);
if (healthyServers.length === 0) {
return res.status(503).json({ error: 'No healthy servers' });
}
// Round-robin selection
const targetServer = healthyServers[currentIndex % healthyServers.length];
currentIndex++;
try {
const response = await axios({
method: req.method,
url: `${targetServer}${req.path}`,
data: req.body,
headers: req.headers,
timeout: 5000
});
res.status(response.status).json(response.data);
} catch (error) {
res.status(500).json({ error: 'Server error' });
}
});
Production Load Balancers
- NGINX - Industry standard, 100K+ req/sec
- HAProxy - Layer 4/7 balancing, health checks
- AWS ALB - Managed, auto-scaling
- Cloudflare Load Balancing - Global, DDoS protection
Auto-Scaling
# Example: Kubernetes Horizontal Pod Autoscaler
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: api-autoscaler
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: api-server
minReplicas: 2 # Always at least 2 pods
maxReplicas: 10 # Scale up to 10 during traffic spikes
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70 # Scale when CPU > 70%
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80 # Scale when memory > 80%
Auto-Scaling Decisions
- Scale UP: CPU > 70% for 2 minutes
- Scale DOWN: CPU < 30% for 5 minutes
- Result: Right-sized infrastructure = optimal cost
Database Read Replicas
import { PrismaClient } from '@prisma/client';
// Primary database (writes)
const prismaWrite = new PrismaClient({
datasources: {
db: { url: process.env.DATABASE_PRIMARY_URL }
}
});
// Read replica (reads only)
const prismaRead = new PrismaClient({
datasources: {
db: { url: process.env.DATABASE_REPLICA_URL }
}
});
// Write operations → primary
async function createOrder(data: OrderInput) {
return prismaWrite.order.create({ data });
}
// Read operations → replica (reduces primary load)
async function getOrders(userId: string) {
return prismaRead.order.findMany({
where: { userId }
});
}
// Impact:
// - Primary handles 100% writes + 0% reads
// - Replica handles 100% reads
// - 3 replicas = 75% load reduction on primary
Monitoring & Profiling
Application Performance Monitoring (APM)
import * as Sentry from '@sentry/node';
import { ProfilingIntegration } from '@sentry/profiling-node';
Sentry.init({
dsn: process.env.SENTRY_DSN,
integrations: [
new ProfilingIntegration()
],
tracesSampleRate: 0.1, // Sample 10% of requests
profilesSampleRate: 0.1
});
// Automatic performance tracking
app.use(Sentry.Handlers.requestHandler());
app.use(Sentry.Handlers.tracingHandler());
// Custom performance tracking
app.get('/api/expensive-operation', async (req, res) => {
const transaction = Sentry.startTransaction({
name: 'Expensive Operation',
op: 'http.request'
});
const span1 = transaction.startChild({ op: 'database.query', description: 'Fetch users' });
const users = await db.user.findMany();
span1.finish();
const span2 = transaction.startChild({ op: 'processing', description: 'Transform data' });
const transformed = processUsers(users);
span2.finish();
transaction.finish();
res.json(transformed);
});
// View in Sentry dashboard:
// - Total request time: 450ms
// - Database query: 250ms (55% of time)
// - Processing: 200ms (44% of time)
// → Optimize database query first!
Best APM Tools
- Datadog APM - Full-stack observability
- New Relic - Real user monitoring
- Sentry - Error + performance tracking
- AWS X-Ray - Distributed tracing for AWS
Performance Benchmarking
import autocannon from 'autocannon';
// Load test your API
async function benchmarkAPI() {
const result = await autocannon({
url: 'http://localhost:3000/api/users',
connections: 100, // 100 concurrent connections
duration: 30, // 30 seconds
pipelining: 1
});
console.log('Performance Results:');
console.log(`Requests/sec: ${result.requests.average}`);
console.log(`Latency P50: ${result.latency.p50}ms`);
console.log(`Latency P95: ${result.latency.p95}ms`);
console.log(`Latency P99: ${result.latency.p99}ms`);
console.log(`Error rate: ${(result.non2xx / result.requests.total) * 100}%`);
}
// Example output:
// Requests/sec: 5,240
// Latency P50: 15ms
// Latency P95: 48ms
// Latency P99: 120ms
// Error rate: 0.02%
Load Testing Tools
- autocannon (Node.js) - 40K+ req/sec benchmarking
- k6 - Modern load testing, Grafana integration
- Apache JMeter - Enterprise-grade, GUI
- wrk - Minimal, blazing fast
Real-Time Performance Monitoring
import prometheus from 'prom-client';
// Create metrics
const httpRequestDuration = new prometheus.Histogram({
name: 'http_request_duration_seconds',
help: 'Duration of HTTP requests in seconds',
labelNames: ['method', 'route', 'status_code'],
buckets: [0.01, 0.05, 0.1, 0.5, 1, 2, 5] // Response time buckets
});
const httpRequestsTotal = new prometheus.Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code']
});
// Middleware to track metrics
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
httpRequestDuration.observe({
method: req.method,
route: req.route?.path || req.path,
status_code: res.statusCode
}, duration);
httpRequestsTotal.inc({
method: req.method,
route: req.route?.path || req.path,
status_code: res.statusCode
});
});
next();
});
// Expose metrics endpoint
app.get('/metrics', async (req, res) => {
res.set('Content-Type', prometheus.register.contentType);
res.end(await prometheus.register.metrics());
});
// Visualize in Grafana:
// - Request rate over time
// - P50/P95/P99 latency graphs
// - Error rate trends
// - Slow endpoint identification
Real-World Examples
Example 1: E-Commerce Product API
Before Optimization
app.get('/api/products/:id', async (req, res) => {
const product = await db.product.findUnique({
where: { id: req.params.id },
include: {
reviews: true, // 1,000+ reviews
variants: true, // 50 variants
images: true, // 10 images
relatedProducts: {
include: {
reviews: true, // Another 1,000+ reviews per product
images: true
}
}
}
});
res.json(product);
});
// Response time: 3,200ms
// Payload: 850KB
After Optimization
import NodeCache from 'node-cache';
const cache = new NodeCache({ stdTTL: 300 });
app.get('/api/products/:id', async (req, res) => {
const productId = req.params.id;
// 1. Check cache
const cached = cache.get(productId);
if (cached) {
res.set('X-Cache', 'HIT');
return res.json(cached);
}
// 2. Optimized query
const product = await db.product.findUnique({
where: { id: productId },
select: {
id: true,
name: true,
price: true,
description: true,
images: {
take: 3, // Only first 3 images
select: { url: true, alt: true }
},
reviews: {
take: 5, // Only latest 5 reviews
orderBy: { created_at: 'desc' },
select: { rating: true, comment: true, author: true }
}
}
});
// 3. Aggregate stats instead of loading all data
const [avgRating, reviewCount] = await db.$queryRaw`
SELECT
AVG(rating) as avg_rating,
COUNT(*) as review_count
FROM reviews
WHERE product_id = ${productId}
`;
const result = {
...product,
avgRating,
reviewCount
};
// 4. Cache result
cache.set(productId, result);
// 5. HTTP caching
res.set('Cache-Control', 'public, max-age=300');
res.set('X-Cache', 'MISS');
res.json(result);
});
// Response time: 45ms (71x faster!)
// Payload: 12KB (70x smaller!)
Example 2: Analytics Dashboard API
Before Optimization
app.get('/api/analytics/dashboard', async (req, res) => {
const userId = req.user.id;
// Sequential queries (waterfall)
const pageViews = await db.pageView.count({ where: { userId } });
const uniqueVisitors = await db.visitor.count({ where: { userId } });
const revenue = await db.order.aggregate({
where: { userId },
_sum: { total: true }
});
const topPages = await db.pageView.groupBy({
by: ['page'],
where: { userId },
_count: true,
orderBy: { _count: { page: 'desc' } },
take: 10
});
res.json({ pageViews, uniqueVisitors, revenue, topPages });
});
// Response time: 2,800ms
After Optimization
import Redis from 'ioredis';
const redis = new Redis();
app.get('/api/analytics/dashboard', async (req, res) => {
const userId = req.user.id;
const cacheKey = `analytics:${userId}:${new Date().toISOString().split('T')[0]}`;
// 1. Check Redis cache (daily cache)
const cached = await redis.get(cacheKey);
if (cached) {
return res.json(JSON.parse(cached));
}
// 2. Parallel queries
const [pageViews, uniqueVisitors, revenue, topPages] = await Promise.all([
db.pageView.count({ where: { userId } }),
db.visitor.count({ where: { userId } }),
db.order.aggregate({
where: { userId },
_sum: { total: true }
}),
db.pageView.groupBy({
by: ['page'],
where: { userId },
_count: true,
orderBy: { _count: { page: 'desc' } },
take: 10
})
]);
const result = { pageViews, uniqueVisitors, revenue, topPages };
// 3. Cache for 1 hour
await redis.setex(cacheKey, 3600, JSON.stringify(result));
res.json(result);
});
// Response time: 180ms first load, 2ms cached (1555x faster!)
Example 3: Search API
Before Optimization
app.get('/api/search', async (req, res) => {
const query = req.query.q;
// Full-text search across all fields
const results = await db.product.findMany({
where: {
OR: [
{ name: { contains: query } },
{ description: { contains: query } },
{ category: { contains: query } },
{ tags: { has: query } }
]
}
});
res.json(results);
});
// Response time: 4,500ms for 100,000+ products
// No relevance ranking
After Optimization with Elasticsearch
import { Client } from '@elastic/elasticsearch';
const elastic = new Client({ node: 'http://localhost:9200' });
app.get('/api/search', async (req, res) => {
const query = req.query.q;
// Elasticsearch full-text search with relevance ranking
const { hits } = await elastic.search({
index: 'products',
body: {
query: {
multi_match: {
query,
fields: ['name^3', 'description', 'category^2', 'tags'],
fuzziness: 'AUTO' // Handle typos
}
},
size: 20,
from: (req.query.page || 0) * 20,
highlight: {
fields: {
name: {},
description: {}
}
}
}
});
const results = hits.hits.map(hit => ({
...hit._source,
score: hit._score,
highlights: hit.highlight
}));
res.json(results);
});
// Response time: 25ms (180x faster!)
// Relevance ranking + typo tolerance + highlighting
Common Mistakes to Avoid
1. Not Using Database Indexes
Symptom: Queries that take seconds on tables with 100K+ rows
Fix: Add indexes on columns used in WHERE, ORDER BY, JOIN
CREATE INDEX idx_orders_user_date ON orders(user_id, created_at DESC);
2. Fetching All Data When You Need Aggregates
Mistake: Loading 1M records to count them
const users = await db.user.findMany(); // Loads 1M records = 500MB
const count = users.length; // 😱
Fix: Use database aggregation
const count = await db.user.count(); // Returns number only
3. Not Implementing Pagination
Mistake: Returning unlimited results
const products = await db.product.findMany(); // Returns 100,000 products = 50MB
Fix: Always paginate
const products = await db.product.findMany({
take: 50,
skip: (page - 1) * 50
});
4. Caching Everything Forever
Mistake: No cache invalidation strategy
cache.set('key', value); // Cached forever, even when data changes
Fix: Use appropriate TTLs
cache.set('key', value, 300); // 5 minutes for frequently changing data
cache.set('config', configValue, 3600); // 1 hour for rarely changing data
5. Synchronous Operations in Request Path
Mistake: Blocking response for non-critical tasks
await sendEmail(); // Blocks response for 500ms
await updateAnalytics(); // Blocks another 200ms
res.json(result); // User waits 700ms unnecessarily
Fix: Queue background jobs
emailQueue.add({ orderId }); // Returns in 1ms
res.json(result); // User gets response immediately
6. Not Monitoring Performance
Mistake: No visibility into slow endpoints
Fix: Add APM + custom metrics
// Track every endpoint's performance
app.use((req, res, next) => {
const start = Date.now();
res.on('finish', () => {
const duration = Date.now() - start;
metrics.recordLatency(req.route?.path, duration);
});
next();
});
7. Over-Optimization Too Early
Mistake: Complex caching before measuring actual bottlenecks
Fix:
- Measure first (add metrics)
- Identify bottleneck (slowest 20% of endpoints)
- Optimize only the bottleneck
- Measure again
- Repeat
8. Not Using Connection Pooling
Mistake: Creating new database connections per request
const db = new Database(); // New connection every request = 100ms overhead
Fix: Use connection pool
const pool = new Pool({ max: 10 }); // Reuse connections = 0.5ms
9. Exposing Internal Implementation Details
Mistake: Returning raw database objects
res.json(user); // Includes password_hash, internal_id, etc.
Fix: Use DTOs (Data Transfer Objects)
res.json({
id: user.id,
name: user.name,
email: user.email
// Only public fields
});
10. Not Testing with Production Data Volumes
Mistake: Testing with 100 records when production has 10M
Fix: Load test with realistic data
// Seed 1M+ test records
// Run load tests: 1K concurrent users
// Identify bottlenecks before launch
Production Checklist
Database
- Indexes on all WHERE, ORDER BY, JOIN columns
- Connection pooling configured (max 10-20 connections)
- Slow query logging enabled (>1s queries)
- Read replicas for read-heavy workloads
- Query result caching (Redis)
- Regular ANALYZE/VACUUM (PostgreSQL)
Caching
- Redis cache for frequently accessed data
- HTTP caching headers (Cache-Control, ETag)
- CDN for static assets + public API responses
- Cache invalidation strategy documented
- Cache hit/miss metrics tracked
Code
- Parallel operations with Promise.all()
- No N+1 queries (use joins/includes)
- Background jobs for non-critical tasks
- Pagination on all list endpoints
- Field selection (don't return unnecessary data)
Network
- Response compression (gzip/brotli)
- HTTP/2 enabled
- Payload size optimized (<100KB typical response)
- CDN for global distribution
Infrastructure
- Load balancer with health checks
- Auto-scaling configured (2-10 instances)
- Database read replicas (1-3 replicas)
- CDN enabled (Cloudflare or similar)
Monitoring
- APM installed (Datadog, New Relic, or Sentry)
- Custom metrics (request rate, latency, errors)
- Slow query alerts (>1s queries)
- Error rate alerts (>1% errors)
- Latency alerts (P95 >500ms)
- Load testing performed (1K+ concurrent users)
Performance Targets
- P50 latency <100ms
- P95 latency <500ms
- P99 latency <2000ms
- Error rate <0.1%
- Throughput: 1000+ RPS per server
- Cache hit rate >70%
Tools & Resources
Performance Monitoring
- Datadog APM - Full-stack observability
- New Relic - Application performance monitoring
- Sentry - Error + performance tracking
- Grafana + Prometheus - Open-source monitoring stack
Load Testing
- autocannon - Fast HTTP/1.1 benchmarking
- k6 - Modern load testing
- Apache JMeter - Enterprise load testing
- Gatling - Scala-based load testing
Database Tools
- pgAdmin - PostgreSQL management
- DataGrip - Universal database IDE
- PgHero - PostgreSQL performance dashboard
CDN Providers
- Cloudflare - Free tier, 250+ locations
- Fastly - Real-time purge
- AWS CloudFront - AWS integration
- Akamai - Enterprise CDN
Caching
- Redis - In-memory data store
- Memcached - Simple key-value cache
- Varnish - HTTP accelerator
Frequently Asked Questions
How much performance improvement can I expect?
Typical gains from this guide:
- Database optimization: 10-100x faster queries
- Caching: 50-1000x faster repeated requests
- Code optimization: 2-10x faster processing
- Infrastructure: 2-5x more throughput
Real example (e-commerce API):
- Before: 3,200ms response time
- After: 45ms response time
- 71x improvement
Should I optimize everything at once?
No. Follow this order:
- Measure - Add APM to identify bottlenecks
- Database - Usually 80% of performance issues
- Caching - Redis + HTTP caching
- Code - Parallel operations, background jobs
- Infrastructure - Load balancing, CDN
Optimize the slowest 20% first (Pareto principle).
When should I add caching?
Add caching when:
- Same data requested frequently (>10 times/minute)
- Data doesn't change often (every 5+ minutes)
- Database queries are slow (>100ms)
- Traffic is growing (>1000 requests/hour)
Don't cache when:
- Data changes constantly (real-time stock prices)
- Each request is unique (user-specific content)
- Data is already fast (<10ms queries)
How many database connections should I use?
Formula: (CPU cores × 2) + storage I/O
Examples:
- 4-core server with SSD: (4 × 2) + 1 = 9 connections
- 8-core server with HDD: (8 × 2) + 4 = 20 connections
Too many connections = contention, slower queries Too few connections = request queuing, timeouts
What's the difference between caching and CDN?
Caching (Redis, in-memory):
- Stores computed results (database queries, API responses)
- Server-side (your infrastructure)
- Invalidate when data changes
CDN (Cloudflare, Fastly):
- Stores static files + API responses
- Edge locations worldwide (close to users)
- Reduces latency for global users
Use both: Redis for dynamic data, CDN for global distribution
Should I use GraphQL or REST for performance?
GraphQL advantages:
- Client specifies exact fields needed (smaller payloads)
- Single request for multiple resources (no multiple round-trips)
GraphQL challenges:
- N+1 query problem (requires DataLoader)
- Caching harder (no URL-based cache keys)
REST advantages:
- Simpler caching (URL-based)
- Better CDN support
Verdict: Both can be fast with proper optimization. Use REST for public APIs, GraphQL for complex client needs.
How do I optimize API calls to third-party services?
Strategies:
- Cache responses aggressively
const cachedResponse = await redis.get(`stripe:customer:${id}`);
if (cachedResponse) return JSON.parse(cachedResponse);
const customer = await stripe.customers.retrieve(id);
await redis.setex(`stripe:customer:${id}`, 3600, JSON.stringify(customer));
- Batch requests when possible
// Bad: 100 API calls
for (const id of customerIds) {
await stripe.customers.retrieve(id);
}
// Good: 1 API call
const customers = await stripe.customers.list({
limit: 100,
starting_after: lastId
});
- Webhooks instead of polling
- Stripe sends webhooks when data changes
- No need to poll for updates every minute
- Monitor third-party API status
- Use API Status Check to track outages
- Implement circuit breakers for failing APIs
- Have fallback strategies
What if my database is still slow after indexing?
Check these:
- Index is being used
EXPLAIN ANALYZE SELECT * FROM orders WHERE user_id = 123;
-- Look for "Index Scan" not "Seq Scan"
- Query is optimized
- Avoid SELECT * (fetch only needed columns)
- Use LIMIT for large results
- Check for N+1 queries
- Database resources
- CPU usage <80%?
- Memory usage <80%?
- Disk I/O not saturated?
- Consider scaling
- Read replicas for read-heavy workloads
- Vertical scaling (more CPU/RAM)
- Query result caching (Redis)
How do I prevent caching stale data?
Strategies:
1. Time-based invalidation (TTL)
redis.setex('key', 300, value); // Cache for 5 minutes
2. Event-based invalidation
async function updateProduct(id, data) {
await db.product.update({ where: { id }, data });
await redis.del(`product:${id}`); // Invalidate immediately
}
3. Cache versioning
const version = await redis.get('cache:version') || 1;
const key = `product:${id}:v${version}`;
// When data structure changes:
await redis.incr('cache:version'); // Invalidates all caches
4. Conditional requests (HTTP)
res.set('ETag', generateHash(data));
if (req.headers['if-none-match'] === etag) {
return res.status(304).end(); // Not modified
}
Next Steps
- Add monitoring - Install APM (Datadog, New Relic, or Sentry)
- Identify bottlenecks - Find slowest 20% of endpoints
- Optimize database - Add indexes, fix N+1 queries
- Add caching - Redis for frequently accessed data
- Load test - Test with realistic traffic (1K+ concurrent users)
- Monitor production - Track P50/P95/P99 latency, error rates
Related guides:
- API Rate Limiting Complete Guide
- API Observability & Distributed Tracing
- API Error Handling Best Practices
- API Caching Strategies Complete Guide
Monitor critical API dependencies:
Check real-time API status at apistatuscheck.com - monitoring 160+ third-party APIs including AI platforms, cloud providers, payment gateways, and developer tools.
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
Free dashboard available · 14-day trial on paid plans · Cancel anytime
Browse Free Dashboard →