ServerlessAWS Lambda2026 Guide

Serverless Monitoring Guide: AWS Lambda, Cold Starts & Observability (2026)

Q: What metrics should I monitor for AWS Lambda?

The seven critical Lambda metrics are: (1) Duration — execution time (watch p99, not just average), (2) Errors — both function errors and Lambda service errors, (3) Throttles — invocations rejected due to concurrency limit, (4) ConcurrentExecutions — active parallel invocations, (5) Init Duration — cold start time, (6) MemoryUsed vs MemorySize — right-sizing opportunity, (7) Invocations — total call volume. Throttles and cold start duration are the most Lambda-specific concerns — both are invisible in traditional APM tools.

Q: How do I reduce Lambda cold start times?

The five most effective cold start reduction strategies: (1) Use Lambda SnapStart for Java (up to 10x faster starts by snapshotting initialized state), (2) Keep package size small — remove unused dependencies, use esbuild/rollup tree-shaking for Node.js, (3) Move heavy initialization outside the handler function so it runs once per container, (4) Use Provisioned Concurrency to pre-warm instances for latency-sensitive functions, (5) For Node.js, prefer ES modules (faster than CommonJS). SnapStart is the biggest win for Java — it reduces Java cold starts from 8-10 seconds to under 1 second.

Q: What is Lambda throttling and how do I fix it?

Throttling occurs when Lambda invocations exceed your account concurrency limit (default 1,000 per region) or a function-level reserved concurrency limit. Throttled invocations return a 429 error. Fix strategies: (1) Request a concurrency limit increase via AWS Support, (2) Add exponential backoff + jitter in your invoker, (3) Use SQS as a buffer between the trigger and your function — SQS handles the retry automatically, (4) Set reserved concurrency limits intentionally to protect downstream resources from traffic spikes.

Q: How do I trace serverless requests across multiple Lambda functions?

AWS X-Ray is the native solution — enable active tracing in Lambda configuration, then call xray.captureAWSv3Client() for DynamoDB, S3, and other service clients. This creates a service map showing all function calls and downstream services. For vendor-neutral tracing, use OpenTelemetry with the Lambda layer (arn:aws:lambda: ::layer:OTelCollector: ), which auto-instruments Lambda and exports to any OTLP backend (Honeycomb, Grafana Tempo, Jaeger).

Q: How do I monitor Lambda costs?

Lambda costs = (invocations × $0.0000002) + (GB-seconds × $0.0000166667). The easiest cost optimizations: (1) Right-size memory — Lambda bills by GB-seconds, so larger memory can be cheaper if it reduces duration more than the memory increase costs, (2) Use the AWS Lambda Power Tuning tool (open source) to find the cost-optimal memory setting, (3) Enable Cost Explorer tags to allocate Lambda costs by function or team, (4) Monitor cost/invocation trends — unexpected spikes usually indicate runaway retry loops or traffic anomalies.

Serverless functions are opaque by default — they crash silently, cold starts add invisible latency, and throttling drops requests without surfacing errors. This guide covers how to achieve full observability for AWS Lambda and other serverless platforms.

Updated April 2026•12 min read•Serverless / AWS

Staff Pick

📡 Monitor your APIs — know when they go down before your users do

Better Stack checks uptime every 30 seconds with instant Slack, email & SMS alerts. Free tier available.

Start Free →

Affiliate link — we may earn a commission at no extra cost to you

TL;DR — Serverless Monitoring Checklist

✅ Enable Lambda Enhanced Monitoring (costs $0.01/function/month, worth it)
✅ Alert on Throttles > 0 — every throttle is a dropped request
✅ Track Init Duration (cold starts) separately from execution duration
✅ Monitor p99 duration not average — cold starts are outliers that kill tail latency
✅ Enable X-Ray active tracing for distributed service map visibility
✅ Add external endpoint check for API Gateway endpoints — Lambda errors don't always surface

Why Serverless Is Hard to Monitor

Traditional monitoring assumes long-running processes with stable memory and CPU metrics. Serverless breaks all these assumptions:

Serverless monitoring challenges

• Cold starts add 100ms–10s of invisible latency
• Functions scale to zero — no persistent process to monitor
• Throttling silently drops requests (no HTTP error to caller)
• Log correlation across invocations requires request IDs
• Ephemeral containers make profiling impossible
• Cost surprises from runaway retry loops

Serverless monitoring wins

• Per-invocation billing = built-in cost metrics
• CloudWatch auto-collects basic metrics for free
• Structured logs easy with JSON + Lambda Powertools
• Request isolation makes failures containable
• Dead letter queues provide automatic failure capture

Core Lambda Metrics Reference

Metric	CloudWatch Name	Alert Threshold
Invocations	Invocations	Anomaly detection vs baseline
Errors	Errors	Error rate > 1% for 5m (critical)
Throttles	Throttles	> 0 for 1m (warning) — every throttle is a dropped call
Duration (p99)	Duration	> 80% of timeout setting (warn before timeout kills requests)
Init Duration	InitDuration (Enhanced)	> 1s (warn), > 5s (critical — consider SnapStart)
Concurrent Executions	ConcurrentExecutions	> 80% of reserved concurrency limit
Memory Used	MaxMemoryUsed (Enhanced)	> 80% of configured memory (OOM risk)

📡

Recommended

Monitor your API Gateway endpoints with Better Stack

Better Stack runs synthetic checks on your serverless API endpoints from 30+ global locations — so you catch Lambda failures before your users do.

Try Better Stack Free →

Cold Start Analysis

Cold starts happen when Lambda allocates a new execution environment — downloading your code, initializing the runtime, and running your initialization code. They add latency that's invisible until you look at p99 duration.

Runtime	Typical Cold Start	Reduction Strategy
Node.js 20	200–500ms	Reduce package size, lazy imports
Python 3.12	300–700ms	Reduce import count, use Lambda layers
Java 21 (standard)	4–12 seconds	Use SnapStart (reduces to <1s)
Java 21 + SnapStart	100–500ms	Restore from snapshot; enable in function config
Go 1.x	100–200ms	Already fast; minimize init dependencies
Rust (custom runtime)	5–50ms	Best cold start performance available

# CloudWatch Insights query: Identify cold start invocations
fields @timestamp, @requestId, @initDuration, @duration, @memorySize
| filter ispresent(@initDuration)
| sort @timestamp desc
| limit 100

# Lambda Powertools for structured logging with cold start detection
import { Logger } from '@aws-lambda-powertools/logger';

const logger = new Logger({ serviceName: 'order-service' });

// Cold start flag in logs — filter on is_cold_start=true in CloudWatch
export const handler = async (event: APIGatewayEvent) => {
  logger.addContext(context);
  logger.info('Processing request', {
    event,
    is_cold_start: (context as any).coldStart, // Powertools tracks this
  });
};

CloudWatch Alarms Setup

# Terraform: Essential Lambda CloudWatch alarms
resource "aws_cloudwatch_metric_alarm" "lambda_errors" {
  alarm_name          = "${var.function_name}-errors"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 2
  metric_name         = "Errors"
  namespace           = "AWS/Lambda"
  period              = 300
  statistic           = "Sum"
  threshold           = var.invocations_per_period * 0.01  # 1% error rate

  dimensions = {
    FunctionName = var.function_name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "lambda_throttles" {
  alarm_name          = "${var.function_name}-throttles"
  comparison_operator = "GreaterThanThreshold"
  evaluation_periods  = 1
  metric_name         = "Throttles"
  namespace           = "AWS/Lambda"
  period              = 60
  statistic           = "Sum"
  threshold           = 0  # Alert on any throttle

  dimensions = {
    FunctionName = var.function_name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

resource "aws_cloudwatch_metric_alarm" "lambda_duration_p99" {
  alarm_name                = "${var.function_name}-duration-p99"
  comparison_operator       = "GreaterThanThreshold"
  evaluation_periods        = 3
  extended_statistic        = "p99"
  metric_name               = "Duration"
  namespace                 = "AWS/Lambda"
  period                    = 300
  threshold                 = var.timeout_ms * 0.8  # 80% of timeout

  dimensions = {
    FunctionName = var.function_name
  }

  alarm_actions = [aws_sns_topic.alerts.arn]
}

Alert Pro

14-day free trial

Stop checking — get alerted instantly

Next time your serverless endpoints goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your serverless endpoints + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Distributed Tracing with X-Ray

AWS X-Ray traces requests across Lambda functions, API Gateway, DynamoDB, SQS, and other AWS services. Enable it at the Lambda configuration level (zero code changes), then add subsegment annotations for key operations.

# Enable X-Ray in SAM template
Resources:
  OrderFunction:
    Type: AWS::Serverless::Function
    Properties:
      Tracing: Active  # Enables X-Ray

# Enable in Terraform
resource "aws_lambda_function" "order" {
  tracing_config {
    mode = "Active"
  }
}

# Node.js — instrument AWS SDK calls
import AWSXRay from 'aws-xray-sdk';
import { DynamoDB } from '@aws-sdk/client-dynamodb';

// Wrap SDK client to auto-create X-Ray subsegments
const dynamodb = AWSXRay.captureAWSv3Client(new DynamoDB({}));

// Add custom annotations for business context
export const handler = async (event: any) => {
  const segment = AWSXRay.getSegment();
  const subsegment = segment?.addNewSubsegment('order-validation');

  subsegment?.addAnnotation('orderId', event.orderId);
  subsegment?.addAnnotation('customerId', event.customerId);
  subsegment?.addMetadata('orderDetails', event);

  try {
    await validateOrder(event);
    subsegment?.close();
  } catch (err) {
    subsegment?.close(err as Error);
    throw err;
  }
};

Serverless Monitoring Tools Comparison

Tool	Best For	Lambda Feature	Pricing
AWS CloudWatch	Native AWS monitoring	All Lambda metrics, Insights query, alarms	First 5GB logs free; $0.50/GB after
Better Stack	Log + uptime monitoring	Lambda log ingestion, API Gateway uptime checks	Free + $20/mo
Datadog Serverless	Enterprise full-stack	Lambda forwarder, enhanced metrics, flamegraphs	$5/million invocations
Lumigo	Serverless-specific	Auto-trace, cost insights, cold start visualization	Free 150K traces/mo + $0.50/1M traces
New Relic Serverless	Full-stack APM + Lambda	Lambda layer, distributed traces, free 100GB/mo	Free + $0.35/GB

FAQ

What metrics should I monitor for AWS Lambda?

The seven critical Lambda metrics: Invocations, Errors, Throttles, Duration (p99), Init Duration (cold starts), ConcurrentExecutions, and MaxMemoryUsed. Throttles and cold start duration are the most Lambda-specific concerns — throttles are silent dropped requests, and cold starts create p99 latency outliers invisible in average duration stats.

How do I reduce Lambda cold start times?

Five effective strategies: (1) Lambda SnapStart for Java (reduces 8-12s to under 1s), (2) Keep package size minimal via tree-shaking and esbuild, (3) Move heavy initialization outside the handler (runs once per container), (4) Provisioned Concurrency for latency-critical functions, (5) Consider Rust or Go custom runtimes for sub-100ms cold starts. SnapStart is the biggest single win for Java Lambda.

What is Lambda throttling and how do I fix it?

Throttling occurs when invocations exceed your concurrency limit (default 1,000/region). Throttled invocations return 429 and are often silently retried. Fix: request a limit increase, add SQS as a buffer between trigger and function (SQS handles retries), set reserved concurrency intentionally to protect downstream databases, add exponential backoff in invokers.

How do I trace serverless requests across multiple Lambda functions?

Enable AWS X-Ray active tracing (zero code change needed). For vendor-neutral: use the OpenTelemetry Lambda layer which auto-instruments function calls and exports to any OTLP backend. Use correlation IDs (pass through as event and context) to link logs across invocations even without formal tracing.

How do I monitor Lambda costs?

Lambda costs = invocations × $0.0000002 + GB-seconds × $0.0000166667. Key optimizations: right-size memory using AWS Lambda Power Tuning tool (sometimes more memory = lower cost due to less duration), enable Cost Explorer tags per function, alert on cost anomalies. A runaway retry loop can multiply costs 100x in minutes — alert on unusual invocation rate.

🛠 Tools We Use & Recommend

Tested across our own infrastructure monitoring 200+ APIs daily

See all →

Better StackBest for API Teams

Uptime Monitoring & Incident Management

Used by 100,000+ websites

Monitors your APIs every 30 seconds. Instant alerts via Slack, email, SMS, and phone calls when something goes down.

“We use Better Stack to monitor every API on this site. It caught 23 outages last month before users reported them.”

Free tier · Paid from $24/moStart Free Monitoring

1PasswordBest for Credential Security

Secrets Management & Developer Security

Trusted by 150,000+ businesses

Manage API keys, database passwords, and service tokens with CLI integration and automatic rotation.

“After covering dozens of outages caused by leaked credentials, we recommend every team use a secrets manager.”

From $2.99/moTry Free for 14 Days

OpteryBest for Privacy

Automated Personal Data Removal

Removes data from 350+ brokers

Removes your personal data from 350+ data broker sites. Protects against phishing and social engineering attacks.

“Service outages sometimes involve data breaches. Optery keeps your personal info off the sites attackers use first.”

From $9.99/moFree Privacy Scan

ElevenLabsBest for AI Voice

AI Voice & Audio Generation

Used by 1M+ developers

Text-to-speech, voice cloning, and audio AI for developers. Build voice features into your apps with a simple API.

“The best AI voice API we've tested — natural-sounding speech with low latency. Essential for any app adding voice features.”

Free tier · Paid from $5/moTry ElevenLabs Free

SEMrushBest for SEO

SEO & Site Performance Monitoring

Used by 10M+ marketers

Track your site health, uptime, search rankings, and competitor movements from one dashboard.

“We use SEMrush to track how our API status pages rank and catch site health issues early.”

From $129.95/moTry SEMrush Free

View full comparison & more tools →Affiliate links — we earn a commission at no extra cost to you

14-day free trial

Stop checking — get alerted instantly

Next time your serverless endpoints goes down, you'll know in under 60 seconds — not when your users start complaining.

Email alerts for your serverless endpoints + 9 more APIs
$0 due today for trial
Cancel anytime — $9/mo after trial

Start Free Trial →Compare all plans →

Also recommended:

Better Stack — all-in-one monitoring 1Password — secure your API keys

Serverless Monitoring Guide: AWS Lambda, Cold Starts & Observability (2026)

TL;DR — Serverless Monitoring Checklist

Why Serverless Is Hard to Monitor

Serverless monitoring challenges

Serverless monitoring wins

Core Lambda Metrics Reference

Cold Start Analysis

CloudWatch Alarms Setup

Stop checking — get alerted instantly

Distributed Tracing with X-Ray

Serverless Monitoring Tools Comparison

FAQ

What metrics should I monitor for AWS Lambda?

How do I reduce Lambda cold start times?

What is Lambda throttling and how do I fix it?

How do I trace serverless requests across multiple Lambda functions?

How do I monitor Lambda costs?

Related Guides

Cloud Monitoring Guide

Distributed Tracing Guide

API Monitoring at Scale

Best Log Management Tools 2026

Stop checking — get alerted instantly