API Caching Strategies: Complete Implementation Guide for High-Performance APIs
API Caching Strategies: Complete Implementation Guide for High-Performance APIs
Caching is the single most impactful performance optimization for APIs. A well-designed caching strategy can reduce response times from 500ms to 50ms, cut infrastructure costs by 80%, and handle 10x more traffic without scaling servers.
This guide covers production-ready caching strategies used by high-performance APIs at Stripe, GitHub, AWS, and Cloudflare.
Table of Contents
- Why Caching Matters
- HTTP Caching Fundamentals
- Cache-Control Headers
- ETags and Conditional Requests
- CDN Caching
- Application-Level Caching
- Distributed Caching with Redis
- Cache Invalidation Strategies
- Cache Warming
- Cache Key Design
- Real-World Examples
- Common Mistakes
- Production Checklist
Why Caching Matters
Without caching:
Request → Database Query (500ms) → JSON Serialization (50ms) → Response (550ms total)
1,000 requests/min = 1,000 database queries = expensive, slow, fragile
With caching:
Request → Cache Hit (5ms) → Response (5ms total)
1,000 requests/min = 10 database queries + 990 cache hits = cheap, fast, scalable
Impact metrics from production APIs:
- 99% cache hit rate → 100x reduction in database load (GitHub API)
- 50ms → 5ms response time improvement (Stripe API)
- 80% infrastructure cost savings (AWS CloudFront vs origin servers)
- 10x traffic handling without scaling servers (Shopify during Black Friday)
HTTP Caching Fundamentals
HTTP caching happens at multiple layers:
Client Browser ← HTTP Proxy ← CDN ← Load Balancer ← Origin Server ← Database
(60s) (5min) (1hr) (none) (15min) (source)
Cache Layers
- Browser Cache: Client-side, user-specific (60s-1hr)
- HTTP Proxy: Shared cache for multiple users (5-15min)
- CDN: Geographic edge caching (1hr-1day)
- Application Cache: In-memory server cache (15min-1hr)
- Database Cache: Query result cache (5-15min)
Cache-Control Headers
The Cache-Control header controls caching behavior at all layers.
Common Directives
// Public, cacheable for 1 hour
res.setHeader('Cache-Control', 'public, max-age=3600');
// Private, only browser can cache
res.setHeader('Cache-Control', 'private, max-age=300');
// Never cache (authentication, sensitive data)
res.setHeader('Cache-Control', 'no-store');
// Cache but revalidate (ensure freshness)
res.setHeader('Cache-Control', 'public, max-age=0, must-revalidate');
// Immutable content (versioned assets)
res.setHeader('Cache-Control', 'public, max-age=31536000, immutable');
Directive Meanings
| Directive | Meaning | Use Case |
|---|---|---|
public |
Any cache can store | Public APIs, static content |
private |
Only browser can cache | User-specific data |
no-store |
Never cache | Sensitive data, authentication |
no-cache |
Cache but revalidate first | Dynamic content with ETags |
max-age=N |
Cache for N seconds | Freshness lifetime |
s-maxage=N |
CDN/proxy cache time (overrides max-age) | Separate client/CDN lifetimes |
must-revalidate |
Revalidate when stale | Ensure consistency |
immutable |
Never revalidate (versioned content) | /assets/app.abc123.js |
Decision Matrix
// Static content (images, CSS, JS with version hashes)
'public, max-age=31536000, immutable'
// API responses (public data, low change frequency)
'public, max-age=300, s-maxage=3600'
// User-specific API responses
'private, max-age=60'
// Real-time data (stock prices, live scores)
'public, max-age=10, s-maxage=60'
// Authentication endpoints
'no-store, no-cache, must-revalidate'
// Frequently changing but cacheable
'public, max-age=0, must-revalidate' + ETag
ETags and Conditional Requests
ETags enable conditional caching: cache content but validate freshness before serving.
How ETags Work
1. Client: GET /api/users/123
2. Server: 200 OK
ETag: "abc123"
{ "name": "Alice", "email": "alice@example.com" }
3. Client caches response + ETag
4. Later request: GET /api/users/123
If-None-Match: "abc123"
5. Server checks if data changed:
- Same ETag → 304 Not Modified (no body)
- Different ETag → 200 OK + new data + new ETag
Bandwidth savings: 304 response = ~100 bytes vs 200 response = 5KB+ (98% reduction)
ETag Implementation
import crypto from 'crypto';
import express from 'express';
const app = express();
function generateETag(data: any): string {
const hash = crypto.createHash('md5');
hash.update(JSON.stringify(data));
return `"${hash.digest('hex')}"`;
}
app.get('/api/users/:id', async (req, res) => {
const user = await db.user.findUnique({
where: { id: req.params.id }
});
if (!user) {
return res.status(404).json({ error: 'User not found' });
}
const etag = generateETag(user);
// Check If-None-Match header
if (req.headers['if-none-match'] === etag) {
return res.status(304).end();
}
res.setHeader('ETag', etag);
res.setHeader('Cache-Control', 'public, max-age=0, must-revalidate');
res.json(user);
});
Strong vs Weak ETags
// Strong ETag: byte-for-byte identical
ETag: "abc123"
// Weak ETag: semantically equivalent (gzip vs uncompressed)
ETag: W/"abc123"
Use weak ETags when:
- Gzip compression changes bytes but not content
- Whitespace formatting differs
- Case-insensitive content
CDN Caching
CDNs cache content at edge locations near users, reducing latency and origin load.
How CDN Caching Works
User in Tokyo → Tokyo CDN Edge (10ms) → Response
↓ (miss)
US Origin Server (200ms) → Response → Cache at Tokyo Edge
Without CDN: Every request travels 200ms to origin
With CDN: First request 200ms, subsequent requests 10ms (95% improvement)
CDN Cache-Control
// Browser caches for 5 minutes, CDN for 1 hour
res.setHeader('Cache-Control', 'public, max-age=300, s-maxage=3600');
// Cloudflare-specific: cache for 2 hours
res.setHeader('Cloudflare-CDN-Cache-Control', 'max-age=7200');
// Fastly-specific: cache for 1 day
res.setHeader('Surrogate-Control', 'max-age=86400');
CDN Providers
| CDN | Use Case | Notable Users |
|---|---|---|
| Cloudflare | Free tier, DDoS protection | Discord, Shopify |
| AWS CloudFront | AWS integration, Lambda@Edge | Netflix, Slack |
| Fastly | Real-time purging, VCL control | GitHub, Stripe |
| Akamai | Enterprise, largest network | Apple, Microsoft |
Cache Key Customization
By default, CDNs cache by full URL. Customize cache keys to improve hit rates:
// Default cache key (separate cache for each query param)
/api/posts?page=1&sort=date&limit=10
/api/posts?limit=10&sort=date&page=1 ← separate cache entry (query order differs)
// Normalized cache key (ignore query order)
cloudfront.createCacheKey({
queryStringsAllowList: ['page', 'sort', 'limit'],
enableAcceptEncodingGzip: true,
headersAllowList: ['Authorization']
});
// Result: both requests share cache
/api/posts?limit=10&page=1&sort=date ← same cache entry
Best practice: Only include query params that actually change the response.
Application-Level Caching
Cache expensive operations in-memory at the application layer.
In-Memory Cache (Single Server)
// Simple LRU cache with node-cache
import NodeCache from 'node-cache';
const cache = new NodeCache({
stdTTL: 600, // 10 minutes default
checkperiod: 120, // Check for expired keys every 2min
useClones: false // Return references (faster)
});
async function getUser(id: string) {
const cacheKey = `user:${id}`;
// Check cache
const cached = cache.get(cacheKey);
if (cached) {
console.log('Cache hit');
return cached;
}
// Cache miss - fetch from database
console.log('Cache miss');
const user = await db.user.findUnique({ where: { id } });
// Store in cache
cache.set(cacheKey, user, 600); // 10 minutes
return user;
}
app.get('/api/users/:id', async (req, res) => {
const user = await getUser(req.params.id);
res.json(user);
});
Limitation: Memory cache is per-server. With multiple servers, cache hit rates drop (each server has separate cache).
Solution: Use distributed caching with Redis.
Distributed Caching with Redis
Redis provides a shared cache across all servers.
Redis Setup
# Install Redis (macOS)
brew install redis
redis-server
# Install Redis (Docker)
docker run -d -p 6379:6379 redis:7-alpine
import { createClient } from 'redis';
const redis = createClient({
url: process.env.REDIS_URL || 'redis://localhost:6379'
});
await redis.connect();
async function getCachedUser(id: string) {
const cacheKey = `user:${id}`;
// Try cache first
const cached = await redis.get(cacheKey);
if (cached) {
return JSON.parse(cached);
}
// Cache miss - fetch from database
const user = await db.user.findUnique({ where: { id } });
// Store in Redis with 10-minute expiration
await redis.setEx(cacheKey, 600, JSON.stringify(user));
return user;
}
Redis Caching Patterns
1. Cache-Aside (Lazy Loading)
Most common pattern: Application checks cache, falls back to database on miss.
async function getPost(id: string) {
const key = `post:${id}`;
// 1. Check cache
const cached = await redis.get(key);
if (cached) return JSON.parse(cached);
// 2. Cache miss - load from database
const post = await db.post.findUnique({ where: { id } });
// 3. Store in cache
await redis.setEx(key, 3600, JSON.stringify(post));
return post;
}
Pros: Simple, only caches requested data
Cons: Cache misses cause latency spikes
2. Write-Through
Write to cache AND database simultaneously.
async function updatePost(id: string, data: any) {
const key = `post:${id}`;
// 1. Update database
const post = await db.post.update({
where: { id },
data
});
// 2. Update cache
await redis.setEx(key, 3600, JSON.stringify(post));
return post;
}
Pros: Cache always fresh
Cons: Write latency (2 writes per update)
3. Write-Behind (Write-Back)
Write to cache immediately, persist to database asynchronously.
async function updatePost(id: string, data: any) {
const key = `post:${id}`;
// 1. Update cache immediately
await redis.setEx(key, 3600, JSON.stringify(data));
// 2. Queue database write (async)
await queue.add('updateDatabase', { id, data });
return data;
}
Pros: Fastest writes
Cons: Risk of data loss if cache fails before persistence
Redis Advanced Features
// Atomic increment (counters, rate limiting)
await redis.incr('api:requests:total');
// Hash operations (store objects efficiently)
await redis.hSet('user:123', {
name: 'Alice',
email: 'alice@example.com',
age: '30'
});
const user = await redis.hGetAll('user:123');
// Sorted sets (leaderboards, time-series)
await redis.zAdd('leaderboard', { score: 1500, value: 'user:123' });
const top10 = await redis.zRange('leaderboard', 0, 9, { REV: true });
// Pub/Sub (cache invalidation across servers)
await redis.publish('cache:invalidate', 'user:123');
Cache Invalidation Strategies
"There are only two hard things in Computer Science: cache invalidation and naming things." — Phil Karlton
1. Time-Based Expiration (TTL)
Simplest strategy: Cache expires after N seconds.
// Set with expiration
await redis.setEx('user:123', 600, JSON.stringify(user));
// Check remaining TTL
const ttl = await redis.ttl('user:123'); // 547 seconds remaining
Pros: Simple, predictable
Cons: Stale data until expiration
Best for: Data that changes infrequently (product catalog, blog posts)
2. Event-Based Invalidation
Invalidate when data changes (most accurate).
async function updateUser(id: string, data: any) {
// 1. Update database
const user = await db.user.update({
where: { id },
data
});
// 2. Invalidate cache
await redis.del(`user:${id}`);
return user;
}
Pros: Always fresh data
Cons: Requires invalidation logic in all write paths
3. Tag-Based Invalidation
Group related cache entries with tags, invalidate all at once.
// Store with tags
await redis.setEx('post:123', 3600, JSON.stringify(post));
await redis.sAdd('tag:user:456:posts', 'post:123');
await redis.sAdd('tag:category:tech:posts', 'post:123');
// Invalidate all posts by user
async function invalidateUserPosts(userId: string) {
const postKeys = await redis.sMembers(`tag:user:${userId}:posts`);
if (postKeys.length > 0) {
await redis.del(postKeys);
await redis.del(`tag:user:${userId}:posts`);
}
}
Pros: Flexible, invalidate related items
Cons: Complex implementation
4. Versioned Keys
Never invalidate — use versioned cache keys instead.
// Version in cache key
const version = await redis.get('user:version') || 1;
const cacheKey = `user:123:v${version}`;
// Increment version to "invalidate" all user caches
async function invalidateUser(id: string) {
await redis.incr('user:version');
}
Pros: No cache deletion needed, atomic invalidation
Cons: Orphaned cache entries (need cleanup)
5. Surrogate Keys (CDN Invalidation)
Fastly pattern: Group related content with surrogate keys.
// Add surrogate key header
res.setHeader('Surrogate-Key', 'user-123 posts all-posts');
// Purge by surrogate key (invalidates all matching entries)
await fastly.purgeKey('user-123'); // Purges all content tagged with user-123
Pros: Instant CDN purging, flexible grouping
Cons: CDN-specific
Cache Warming
Pre-populate cache before traffic arrives (cold start prevention).
When to Warm Cache
- Application deployment
- Cache server restart
- Scheduled cache expiration
- Known traffic spikes (product launches, sales)
Implementation
// Warm critical data on startup
async function warmCache() {
console.log('Warming cache...');
// Popular posts
const popularPosts = await db.post.findMany({
where: { views: { gt: 10000 } },
take: 100
});
for (const post of popularPosts) {
const key = `post:${post.id}`;
await redis.setEx(key, 3600, JSON.stringify(post));
}
// Homepage data
const homepage = await db.page.findUnique({
where: { slug: 'home' }
});
await redis.setEx('page:home', 600, JSON.stringify(homepage));
console.log(`Cache warmed: ${popularPosts.length} posts + homepage`);
}
// Run on server start
await warmCache();
Progressive Warming
Warm cache gradually to avoid overwhelming database.
import pLimit from 'p-limit';
async function warmCacheProgressive() {
const limit = pLimit(10); // Max 10 concurrent queries
const posts = await db.post.findMany({ take: 1000 });
await Promise.all(
posts.map(post =>
limit(async () => {
const key = `post:${post.id}`;
await redis.setEx(key, 3600, JSON.stringify(post));
})
)
);
}
Cache Key Design
Good cache keys prevent collisions and enable flexible invalidation.
Best Practices
// ❌ Bad: Ambiguous, collision-prone
cache.set('123', user);
cache.set('123', post); // Overwrites user!
// ✅ Good: Namespaced, clear intent
cache.set('user:123', user);
cache.set('post:123', post);
// ✅ Better: Include version/filters
cache.set('user:123:v2', user);
cache.set('posts:category:tech:page:1:limit:10', posts);
// ✅ Best: Hierarchical with separators
cache.set('api:v1:users:123:profile', user);
cache.set('api:v1:posts:category:tech:page:1', posts);
Normalize Query Parameters
function buildCacheKey(baseKey: string, params: Record<string, any>): string {
// Sort params for consistent keys
const sortedParams = Object.keys(params)
.sort()
.map(key => `${key}:${params[key]}`)
.join(':');
return `${baseKey}:${sortedParams}`;
}
// Both produce same key
buildCacheKey('posts', { page: 1, limit: 10, sort: 'date' });
buildCacheKey('posts', { sort: 'date', limit: 10, page: 1 });
// → "posts:limit:10:page:1:sort:date"
Hash Long Keys
import crypto from 'crypto';
function hashKey(key: string): string {
if (key.length <= 100) return key;
const hash = crypto.createHash('sha256').update(key).digest('hex');
return `${key.slice(0, 50)}:${hash}`;
}
// Before: "api:users:search:q:long+search+query+with+many+words:page:1:limit:10:sort:relevance"
// After: "api:users:search:q:long+search+query+with+man:abc123..."
Real-World Examples
Stripe API Caching
// Stripe caches idempotent operations with Idempotency-Key header
app.post('/api/charges', async (req, res) => {
const idempotencyKey = req.headers['idempotency-key'];
if (idempotencyKey) {
// Check cache for duplicate request
const cached = await redis.get(`idempotency:${idempotencyKey}`);
if (cached) {
return res.json(JSON.parse(cached));
}
}
// Process charge
const charge = await stripe.charges.create(req.body);
// Cache result for 24 hours
if (idempotencyKey) {
await redis.setEx(`idempotency:${idempotencyKey}`, 86400, JSON.stringify(charge));
}
res.json(charge);
});
Learn more: Stripe API Status
GitHub API Conditional Requests
// GitHub uses ETags + conditional requests
app.get('/api/repos/:owner/:repo', async (req, res) => {
const { owner, repo } = req.params;
const repoData = await db.repo.findUnique({ where: { owner, repo } });
const etag = `"${repoData.updatedAt.getTime()}"`;
if (req.headers['if-none-match'] === etag) {
res.setHeader('X-RateLimit-Remaining', '4999');
return res.status(304).end(); // Doesn't count against rate limit!
}
res.setHeader('ETag', etag);
res.setHeader('Cache-Control', 'private, max-age=60');
res.setHeader('X-RateLimit-Remaining', '4998');
res.json(repoData);
});
Rate limit optimization: 304 responses don't count against GitHub's rate limit.
Learn more: GitHub API Status
Shopify CDN Caching
// Shopify caches product pages at CDN with Vary header
app.get('/products/:id', async (req, res) => {
const product = await db.product.findUnique({
where: { id: req.params.id }
});
// Cache varies by currency
res.setHeader('Vary', 'Accept-Encoding, Cookie');
res.setHeader('Cache-Control', 'public, max-age=300, s-maxage=3600');
res.json(product);
});
Vary header: CDN creates separate cache entries for different cookie/encoding values.
Learn more: Shopify API Status
Common Mistakes
1. Caching Authenticated Responses Publicly
// ❌ SECURITY RISK: User data cached publicly
app.get('/api/me', authenticate, async (req, res) => {
res.setHeader('Cache-Control', 'public, max-age=300'); // WRONG!
res.json(req.user);
});
// ✅ Correct: Private cache only
app.get('/api/me', authenticate, async (req, res) => {
res.setHeader('Cache-Control', 'private, max-age=300');
res.json(req.user);
});
Impact: User A's data served to User B (data leak!)
2. Not Setting Vary Headers
// ❌ Wrong: Different content, same cache key
app.get('/api/products', async (req, res) => {
const currency = req.headers['x-currency'] || 'USD';
const products = await getProductsInCurrency(currency);
res.setHeader('Cache-Control', 'public, max-age=300');
res.json(products); // USD prices served to EUR users!
});
// ✅ Correct: Vary by currency header
app.get('/api/products', async (req, res) => {
const currency = req.headers['x-currency'] || 'USD';
const products = await getProductsInCurrency(currency);
res.setHeader('Vary', 'X-Currency');
res.setHeader('Cache-Control', 'public, max-age=300');
res.json(products);
});
3. Caching Errors
// ❌ Wrong: 500 errors cached for 1 hour
app.get('/api/posts', async (req, res) => {
res.setHeader('Cache-Control', 'public, max-age=3600');
try {
const posts = await db.post.findMany();
res.json(posts);
} catch (error) {
res.status(500).json({ error: 'Database error' }); // Cached!
}
});
// ✅ Correct: Only cache successful responses
app.get('/api/posts', async (req, res) => {
try {
const posts = await db.post.findMany();
res.setHeader('Cache-Control', 'public, max-age=3600');
res.json(posts);
} catch (error) {
res.setHeader('Cache-Control', 'no-store');
res.status(500).json({ error: 'Database error' });
}
});
4. Not Handling Cache Stampede
Problem: Cache expires → 1,000 concurrent requests → 1,000 database queries → database overwhelmed.
// ❌ Wrong: Every request queries database on cache miss
async function getPopularPosts() {
const cached = await redis.get('popular-posts');
if (cached) return JSON.parse(cached);
// 1,000 concurrent requests all run this query!
const posts = await db.post.findMany({ take: 10 });
await redis.setEx('popular-posts', 600, JSON.stringify(posts));
return posts;
}
// ✅ Correct: Use locking to prevent stampede
import Redlock from 'redlock';
const redlock = new Redlock([redis]);
async function getPopularPosts() {
const cacheKey = 'popular-posts';
const cached = await redis.get(cacheKey);
if (cached) return JSON.parse(cached);
// Acquire lock
const lock = await redlock.acquire([`lock:${cacheKey}`], 5000);
try {
// Double-check cache (another request may have populated it)
const rechecked = await redis.get(cacheKey);
if (rechecked) return JSON.parse(rechecked);
// Only 1 request queries database
const posts = await db.post.findMany({ take: 10 });
await redis.setEx(cacheKey, 600, JSON.stringify(posts));
return posts;
} finally {
await lock.release();
}
}
5. Ignoring Cache Size Limits
// ❌ Wrong: Unbounded cache growth
async function cacheSearchResults(query: string, results: any[]) {
await redis.set(`search:${query}`, JSON.stringify(results));
}
// ✅ Correct: Set max memory + eviction policy
# redis.conf
maxmemory 2gb
maxmemory-policy allkeys-lru # Evict least recently used keys
Eviction policies:
allkeys-lru: Evict least recently used (recommended for caching)volatile-lru: Evict LRU among keys with TTLallkeys-lfu: Evict least frequently usedvolatile-ttl: Evict keys with shortest TTL first
6. Not Monitoring Cache Hit Rate
// Track cache hits/misses
async function getCached(key: string) {
const value = await redis.get(key);
if (value) {
await redis.incr('cache:hits');
return JSON.parse(value);
} else {
await redis.incr('cache:misses');
return null;
}
}
// Monitor hit rate
app.get('/api/metrics', async (req, res) => {
const hits = parseInt(await redis.get('cache:hits') || '0');
const misses = parseInt(await redis.get('cache:misses') || '0');
const total = hits + misses;
const hitRate = total > 0 ? (hits / total * 100).toFixed(2) : 0;
res.json({ hits, misses, hitRate: `${hitRate}%` });
});
Target hit rate: 90%+ for effective caching
Production Checklist
HTTP Caching
- Cache-Control headers set on all routes
- public vs private used correctly
- no-store used for sensitive data (authentication, payments)
- Vary headers set when response varies by header
- ETags implemented for dynamic content
- Conditional requests (If-None-Match) supported
- Immutable directive used for versioned assets
CDN
- CDN deployed (Cloudflare, AWS CloudFront, Fastly)
- s-maxage set for CDN caching
- Cache key normalized (query param order ignored)
- Purge/invalidation API integrated
- Origin shield configured (reduce origin load)
- Custom cache rules tested
Application Cache
- Redis/Memcached deployed
- Cache key naming convention documented
- TTL values tuned per data type
- Cache warming implemented for critical data
- Distributed locking prevents cache stampede
- maxmemory + eviction policy configured
Invalidation
- Event-based invalidation on writes
- Tag-based invalidation for related items
- Surrogate keys for CDN purging
- Invalidation tested in staging
Monitoring
- Cache hit rate tracked (>90% target)
- Cache miss latency monitored
- Memory usage alerted
- Eviction rate tracked
- Slow query logs reviewed
Security
- private used for user-specific data
- no-store used for sensitive responses
- Vary: Cookie prevents cache poisoning
- Cache tested with different users/roles
Testing
- Load tested with realistic traffic
- Cache stampede tested (concurrent cache misses)
- Invalidation tested (data consistency)
- Cold start tested (cache warming)
Conclusion
Effective API caching requires layered strategies:
- HTTP caching (browsers, proxies): 60s-1hr for public data
- CDN caching (Cloudflare, Fastly): 1hr-1day at edge locations
- Application caching (Redis): 5-15min for expensive queries
- Database caching (query cache): 1-5min for frequently accessed data
Start simple:
- Add
Cache-Controlheaders to public routes - Deploy a CDN (Cloudflare free tier)
- Cache expensive queries in Redis
Measure impact:
- Monitor cache hit rate (>90% target)
- Track response time improvements
- Measure infrastructure cost savings
Related guides:
Monitor real-time API status for Cloudflare, AWS, Redis, Fastly, and 160+ other APIs at API Status Check.
API Status Check
Stop checking API status pages manually
Get instant email alerts when OpenAI, Stripe, AWS, and 100+ APIs go down. Know before your users do.
Free dashboard available · 14-day trial on paid plans · Cancel anytime
Browse Free Dashboard →