AWS Lambda Best Practices — Cold Starts, Memory, Layers, and Production Patterns

AWS Lambda best practices for production: cold start optimization, memory configuration, Lambda Layers, environment variables, error handling, and cost control.

Introduction

AWS Lambda is the foundation of serverless computing on AWS — you deploy code, AWS runs it, you pay per invocation. Sounds simple. But in production, Lambda has enough quirks that getting it wrong leads to slow APIs, runaway bills, and mysterious failures. This guide covers the patterns that separate amateur serverless code from production-grade deployments.

Cold Starts — The Most Misunderstood Problem

A cold start happens when Lambda needs to provision a new execution environment because no warm instance is available. The full cold start sequence is:

  1. Download the function package from S3
  2. Start a new container
  3. Initialize the runtime (JVM, Node.js, Python interpreter)
  4. Run your initialization code (outside the handler)
  5. Run your handler

Steps 1-3 are controlled by AWS. Steps 4-5 are your responsibility. The typical cold start duration by runtime: Go/Rust ~1ms, Python/Node ~100-300ms, Java/JVM ~1-3s.

Cold Start Optimization Strategies

1. Keep the package small

# Check your deployment package size
zip -r function.zip . -x "*.pyc" "__pycache__/*" "tests/*"
ls -lh function.zip

# Target: under 3MB for Python/Node, under 50MB for JVM
# Use Lambda Layers to externalize large dependencies

2. Move initialization outside the handler

import boto3
import json

# GOOD: initialized once, reused across invocations
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('my-table')
ssm_client = boto3.client('ssm')

# Cache config at cold start
def _load_config():
    response = ssm_client.get_parameter(Name='/app/config', WithDecryption=True)
    return json.loads(response['Parameter']['Value'])

CONFIG = _load_config()

def handler(event, context):
    # Use pre-initialized clients — no cold start penalty here
    result = table.get_item(Key={'id': event['id']})
    return result.get('Item', {})

3. Provisioned Concurrency for latency-sensitive endpoints

# Provision 10 warm instances at all times
aws lambda put-provisioned-concurrency-config \
    --function-name my-api \
    --qualifier prod \
    --provisioned-concurrent-executions 10

Provisioned Concurrency costs more but eliminates cold starts entirely for those pre-warmed instances. Use it for synchronous APIs where p99 latency matters, not for async batch jobs.

Memory Configuration — Performance vs Cost

Lambda allocates CPU proportionally to memory. Doubling memory roughly doubles CPU, which often halves execution time. The math can make higher memory cheaper:

Memory Duration Cost per 1M invocations
128 MB800ms$1.33
256 MB400ms$1.33
512 MB200ms$1.33
1024 MB80ms$1.07

Use the AWS Lambda Power Tuning tool to find the optimal memory configuration for your specific function. Don't guess.

Lambda Layers — Sharing Dependencies

A Layer is a ZIP archive that is mounted at /opt in your function's execution environment. Use layers to:

  • Share large dependencies (numpy, pandas, Pillow) across multiple functions
  • Keep your deployment packages small (faster deploys, faster cold starts)
  • Separate runtime code from business logic
# Create a Python layer with dependencies
mkdir -p python/lib/python3.12/site-packages
pip install requests boto3 -t python/lib/python3.12/site-packages/
zip -r dependencies-layer.zip python/

# Publish the layer
aws lambda publish-layer-version \
    --layer-name my-dependencies \
    --zip-file fileb://dependencies-layer.zip \
    --compatible-runtimes python3.12 \
    --compatible-architectures x86_64 arm64
# serverless.yml / SAM template
Functions:
  MyFunction:
    Layers:
      - !Ref DependenciesLayer
      - arn:aws:lambda:us-east-1:017000801446:layer:AWSLambdaPowertoolsPythonV2:51

Environment Variables and Secrets

Never hardcode secrets in Lambda functions. Use environment variables for non-sensitive configuration and AWS Systems Manager Parameter Store or Secrets Manager for sensitive values:

import os
import boto3
import json
from functools import lru_cache

# Non-sensitive config: use environment variables
DATABASE_URL = os.environ['DATABASE_URL']
LOG_LEVEL = os.environ.get('LOG_LEVEL', 'INFO')

# Sensitive secrets: fetch from Secrets Manager at cold start
@lru_cache(maxsize=1)
def get_secret(secret_name: str) -> dict:
    client = boto3.client('secretsmanager')
    response = client.get_secret_value(SecretId=secret_name)
    return json.loads(response['SecretString'])

def handler(event, context):
    secret = get_secret(os.environ['SECRET_NAME'])
    api_key = secret['api_key']
    # ... use api_key

Error Handling and Retries

Lambda has different retry behaviors depending on the invocation type:

  • Synchronous (API Gateway, ALB): no automatic retries. The caller gets the error immediately.
  • Asynchronous (S3, SNS): retries twice by default. Configure a Dead Letter Queue.
  • Stream-based (Kinesis, DynamoDB Streams): retries until success or record expiration (up to 7 days).
import json
import logging
import traceback

logger = logging.getLogger()
logger.setLevel(logging.INFO)

def handler(event, context):
    try:
        result = process_event(event)
        return {
            'statusCode': 200,
            'body': json.dumps(result)
        }
    except ValueError as e:
        # Client error — don't retry
        logger.warning(f"Validation error: {e}")
        return {
            'statusCode': 400,
            'body': json.dumps({'error': str(e)})
        }
    except Exception as e:
        # Server error — log full traceback
        logger.error(f"Unhandled exception: {traceback.format_exc()}")
        raise  # Re-raise to trigger Lambda retry (for async) / return 500 (for sync)

Structured Logging with AWS Lambda Powertools

AWS Lambda Powertools is the official toolkit for production Lambda functions. Install it as a Lambda Layer and get structured logging, tracing, and metrics for free:

from aws_lambda_powertools import Logger, Tracer, Metrics
from aws_lambda_powertools.metrics import MetricUnit

logger = Logger(service="payment-service")
tracer = Tracer(service="payment-service")
metrics = Metrics(namespace="MyApp", service="payment-service")

@logger.inject_lambda_context(log_event=True)
@tracer.capture_lambda_handler
@metrics.log_metrics(capture_cold_start_metric=True)
def handler(event, context):
    logger.info("Processing payment", extra={"amount": event.get("amount")})
    metrics.add_metric(name="PaymentsProcessed", unit=MetricUnit.Count, value=1)
    return {"status": "ok"}

Cost Control

  • Set concurrency limits on non-critical functions to prevent runaway costs from bugs or floods
  • Use ARM64 (Graviton2) — 20% cheaper and often 10-15% faster than x86
  • Set appropriate timeouts — a function that should run in 100ms should time out at 500ms, not 15 minutes
  • Enable X-Ray sampling instead of tracing every invocation at high volume
# Set reserved concurrency (0 = disable, N = cap at N)
aws lambda put-function-concurrency \
    --function-name my-function \
    --reserved-concurrent-executions 100

DevKits Tools for Serverless Development

Build and debug your Lambda functions faster with these DevKits tools:

Summary

Production Lambda excellence comes down to a few key habits: keep packages small, initialize outside the handler, right-size memory, externalize secrets, and instrument everything with structured logging. Lambda is cheap when you know how it works and expensive when you don't.

Deploy Your Serverless Apps — Recommended Hosting

🌐
Hostinger
Web Hosting from $2.99/mo
💧
DigitalOcean
$200 Free Credit