Introduction
AWS Lambda is the foundation of serverless computing on AWS — you deploy code, AWS runs it, you pay per invocation. Sounds simple. But in production, Lambda has enough quirks that getting it wrong leads to slow APIs, runaway bills, and mysterious failures. This guide covers the patterns that separate amateur serverless code from production-grade deployments.
Cold Starts — The Most Misunderstood Problem
A cold start happens when Lambda needs to provision a new execution environment because no warm instance is available. The full cold start sequence is:
- Download the function package from S3
- Start a new container
- Initialize the runtime (JVM, Node.js, Python interpreter)
- Run your initialization code (outside the handler)
- Run your handler
Steps 1-3 are controlled by AWS. Steps 4-5 are your responsibility. The typical cold start duration by runtime: Go/Rust ~1ms, Python/Node ~100-300ms, Java/JVM ~1-3s.
Cold Start Optimization Strategies
1. Keep the package small
# Check your deployment package size
zip -r function.zip . -x "*.pyc" "__pycache__/*" "tests/*"
ls -lh function.zip
# Target: under 3MB for Python/Node, under 50MB for JVM
# Use Lambda Layers to externalize large dependencies
2. Move initialization outside the handler
import boto3
import json
# GOOD: initialized once, reused across invocations
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')
ssm_client = boto3.client('ssm')
# Cache config at cold start
def _load_config():
response = ssm_client.get_parameter(Name='/app/config', WithDecryption=True)
return json.loads(response['Parameter']['Value'])
CONFIG = _load_config()
def handler(event, context):
# Use pre-initialized clients — no cold start penalty here
result = table.get_item(Key={'id': event['id']})
return result.get('Item', {})
3. Provisioned Concurrency for latency-sensitive endpoints
# Provision 10 warm instances at all times
aws lambda put-provisioned-concurrency-config \
--function-name my-api \
--qualifier prod \
--provisioned-concurrent-executions 10
Provisioned Concurrency costs more but eliminates cold starts entirely for those pre-warmed instances. Use it for synchronous APIs where p99 latency matters, not for async batch jobs.
Memory Configuration — Performance vs Cost
Lambda allocates CPU proportionally to memory. Doubling memory roughly doubles CPU, which often halves execution time. The math can make higher memory cheaper:
| Memory | Duration | Cost per 1M invocations |
|---|---|---|
| 128 MB | 800ms | $1.33 |
| 256 MB | 400ms | $1.33 |
| 512 MB | 200ms | $1.33 |
| 1024 MB | 80ms | $1.07 |
Use the AWS Lambda Power Tuning tool to find the optimal memory configuration for your specific function. Don't guess.
Lambda Layers — Sharing Dependencies
A Layer is a ZIP archive that is mounted at /opt in your function's execution environment. Use layers to:
- Share large dependencies (numpy, pandas, Pillow) across multiple functions
- Keep your deployment packages small (faster deploys, faster cold starts)
- Separate runtime code from business logic
# Create a Python layer with dependencies
mkdir -p python/lib/python3.12/site-packages
pip install requests boto3 -t python/lib/python3.12/site-packages/
zip -r dependencies-layer.zip python/
# Publish the layer
aws lambda publish-layer-version \
--layer-name my-dependencies \
--zip-file fileb://dependencies-layer.zip \
--compatible-runtimes python3.12 \
--compatible-architectures x86_64 arm64
# serverless.yml / SAM template
Functions:
MyFunction:
Layers:
- !Ref DependenciesLayer
- arn:aws:lambda:us-east-1:017000801446:layer:AWSLambdaPowertoolsPythonV2:51
Environment Variables and Secrets
Never hardcode secrets in Lambda functions. Use environment variables for non-sensitive configuration and AWS Systems Manager Parameter Store or Secrets Manager for sensitive values:
import os
import boto3
import json
from functools import lru_cache
# Non-sensitive config: use environment variables
DATABASE_URL = os.environ['DATABASE_URL']
LOG_LEVEL = os.environ.get('LOG_LEVEL', 'INFO')
# Sensitive secrets: fetch from Secrets Manager at cold start
@lru_cache(maxsize=1)
def get_secret(secret_name: str) -> dict:
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId=secret_name)
return json.loads(response['SecretString'])
def handler(event, context):
secret = get_secret(os.environ['SECRET_NAME'])
api_key = secret['api_key']
# ... use api_key
Error Handling and Retries
Lambda has different retry behaviors depending on the invocation type:
- Synchronous (API Gateway, ALB): no automatic retries. The caller gets the error immediately.
- Asynchronous (S3, SNS): retries twice by default. Configure a Dead Letter Queue.
- Stream-based (Kinesis, DynamoDB Streams): retries until success or record expiration (up to 7 days).
import json
import logging
import traceback
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def handler(event, context):
try:
result = process_event(event)
return {
'statusCode': 200,
'body': json.dumps(result)
}
except ValueError as e:
# Client error — don't retry
logger.warning(f"Validation error: {e}")
return {
'statusCode': 400,
'body': json.dumps({'error': str(e)})
}
except Exception as e:
# Server error — log full traceback
logger.error(f"Unhandled exception: {traceback.format_exc()}")
raise # Re-raise to trigger Lambda retry (for async) / return 500 (for sync)
Structured Logging with AWS Lambda Powertools
AWS Lambda Powertools is the official toolkit for production Lambda functions. Install it as a Lambda Layer and get structured logging, tracing, and metrics for free:
from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit
logger = Logger(service="payment-service")
tracer = Tracer(service="payment-service")
metrics = Metrics(namespace="MyApp", service="payment-service")
@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def handler(event, context):
logger.info("Processing payment", extra={"amount": event.get("amount")})
metrics.add_metric(name="PaymentsProcessed", unit=MetricUnit.Count, value=1)
return {"status": "ok"}
Cost Control
- Set concurrency limits on non-critical functions to prevent runaway costs from bugs or floods
- Use ARM64 (Graviton2) — 20% cheaper and often 10-15% faster than x86
- Set appropriate timeouts — a function that should run in 100ms should time out at 500ms, not 15 minutes
- Enable X-Ray sampling instead of tracing every invocation at high volume
# Set reserved concurrency (0 = disable, N = cap at N)
aws lambda put-function-concurrency \
--function-name my-function \
--reserved-concurrent-executions 100
DevKits Tools for Serverless Development
Build and debug your Lambda functions faster with these DevKits tools:
- JSON Formatter — format and validate Lambda event payloads and responses
- Base64 Encoder/Decoder — decode Kinesis/SQS message bodies
Summary
Production Lambda excellence comes down to a few key habits: keep packages small, initialize outside the handler, right-size memory, externalize secrets, and instrument everything with structured logging. Lambda is cheap when you know how it works and expensive when you don't.