AWS Serverless EDA
AWS serverless and event-driven architecture skill for Lambda, API Gateway, DynamoDB, Step Functions, EventBridge, SQS, and SNS systems.
What this skill does
Build scalable applications that automatically handle traffic spikes without requiring server maintenance. Create reliable systems and automated processes using proven AWS best practices for cost efficiency and growth. Use this whenever you need resilient software that focuses on your product rather than managing servers.
name: aws-serverless-eda description: AWS serverless and event-driven architecture expert based on Well-Architected Framework. Use when building serverless APIs, Lambda functions, REST APIs, microservices, or async workflows. Covers Lambda with TypeScript/Python, API Gateway (REST/HTTP), DynamoDB, Step Functions, EventBridge, SQS, SNS, and serverless patterns. Essential when user mentions serverless, Lambda, API Gateway, event-driven, async processing, queues, pub/sub, or wants to build scalable serverless applications with AWS best practices. context: fork skills:
- aws-mcp-setup
- aws-cdk-development allowed-tools:
- mcp__aws-mcp__*
- mcp__awsdocs__*
- mcp__cdk__*
- Bash(sam *)
- Bash(aws lambda *)
- Bash(aws apigateway *)
- Bash(aws apigatewayv2 *)
- Bash(aws dynamodb *)
- Bash(aws stepfunctions *)
- Bash(aws events *)
- Bash(aws sqs *)
- Bash(aws sns *)
- Bash(aws sts get-caller-identity)
hooks:
PreToolUse:
- matcher: Bash(sam deploy*) command: aws sts get-caller-identity —query Account —output text once: true
AWS Serverless & Event-Driven Architecture
This skill provides comprehensive guidance for building serverless applications and event-driven architectures on AWS based on Well-Architected Framework principles.
AWS Documentation Requirement
Always verify AWS facts using MCP tools (mcp__aws-mcp__* or mcp__*awsdocs*__*) before answering. The aws-mcp-setup dependency is auto-loaded — if MCP tools are unavailable, guide the user through that skill’s setup flow.
Serverless MCP Servers
This skill leverages the CDK MCP server (provided via aws-cdk-development dependency) and AWS Documentation MCP for serverless guidance.
Note: The following AWS MCP servers are available separately via the Full AWS MCP Server (see
aws-mcp-setupskill) and are not bundled with this plugin:
- AWS Serverless MCP — SAM CLI lifecycle (init, deploy, local test)
- AWS Lambda Tool MCP — Direct Lambda invocation
- AWS Step Functions MCP — Workflow orchestration
- Amazon SNS/SQS MCP — Messaging and queue management
When to Use This Skill
Use this skill when:
- Building serverless applications with Lambda
- Designing event-driven architectures
- Implementing microservices patterns
- Creating asynchronous processing workflows
- Orchestrating multi-service transactions
- Building real-time data processing pipelines
- Implementing saga patterns for distributed transactions
- Designing for scale and resilience
AWS Well-Architected Serverless Design Principles
1. Speedy, Simple, Singular
Functions should be concise and single-purpose
// ✅ GOOD - Single purpose, focused function
export const processOrder = async (event: OrderEvent) => {
// Only handles order processing
const order = await validateOrder(event);
await saveOrder(order);
await publishOrderCreatedEvent(order);
return { statusCode: 200, body: JSON.stringify({ orderId: order.id }) };
};
// ❌ BAD - Function does too much
export const handleEverything = async (event: any) => {
// Handles orders, inventory, payments, shipping...
// Too many responsibilities
};
Keep functions environmentally efficient and cost-aware:
- Minimize cold start times
- Optimize memory allocation
- Use provisioned concurrency only when needed
- Leverage connection reuse
2. Think Concurrent Requests, Not Total Requests
Design for concurrency, not volume
Lambda scales horizontally - design considerations should focus on:
- Concurrent execution limits
- Downstream service throttling
- Shared resource contention
- Connection pool sizing
// Consider concurrent Lambda executions accessing DynamoDB
const table = new dynamodb.Table(this, 'Table', {
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST, // Auto-scales with load
});
// Or with provisioned capacity + auto-scaling
const table = new dynamodb.Table(this, 'Table', {
billingMode: dynamodb.BillingMode.PROVISIONED,
readCapacity: 5,
writeCapacity: 5,
});
// Enable auto-scaling for concurrent load
table.autoScaleReadCapacity({ minCapacity: 5, maxCapacity: 100 });
table.autoScaleWriteCapacity({ minCapacity: 5, maxCapacity: 100 });
3. Share Nothing
Function runtime environments are short-lived
// ❌ BAD - Relying on local file system
export const handler = async (event: any) => {
fs.writeFileSync('/tmp/data.json', JSON.stringify(data)); // Lost after execution
};
// ✅ GOOD - Use persistent storage
export const handler = async (event: any) => {
await s3.putObject({
Bucket: process.env.BUCKET_NAME,
Key: 'data.json',
Body: JSON.stringify(data),
});
};
State management:
- Use DynamoDB for persistent state
- Use Step Functions for workflow state
- Use ElastiCache for session state
- Use S3 for file storage
4. Assume No Hardware Affinity
Applications must be hardware-agnostic
Infrastructure can change without notice:
- Lambda functions can run on different hardware
- Container instances can be replaced
- No assumption about underlying infrastructure
Design for portability:
- Use environment variables for configuration
- Avoid hardware-specific optimizations
- Test across different environments
5. Orchestrate with State Machines, Not Function Chaining
Use Step Functions for orchestration
// ❌ BAD - Lambda function chaining
export const handler1 = async (event: any) => {
const result = await processStep1(event);
await lambda.invoke({
FunctionName: 'handler2',
Payload: JSON.stringify(result),
});
};
// ✅ GOOD - Step Functions orchestration
const stateMachine = new stepfunctions.StateMachine(this, 'OrderWorkflow', {
definition: stepfunctions.Chain
.start(validateOrder)
.next(processPayment)
.next(shipOrder)
.next(sendConfirmation),
});
Benefits of Step Functions:
- Visual workflow representation
- Built-in error handling and retries
- Execution history and debugging
- Parallel and sequential execution
- Service integrations without code
6. Use Events to Trigger Transactions
Event-driven over synchronous request/response
// Pattern: Event-driven processing
const bucket = new s3.Bucket(this, 'DataBucket');
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(processFunction),
{ prefix: 'uploads/' }
);
// Pattern: EventBridge integration
const rule = new events.Rule(this, 'OrderRule', {
eventPattern: {
source: ['orders'],
detailType: ['OrderPlaced'],
},
});
rule.addTarget(new targets.LambdaFunction(processOrderFunction));
Benefits:
- Loose coupling between services
- Asynchronous processing
- Better fault tolerance
- Independent scaling
7. Design for Failures and Duplicates
Operations must be idempotent
// ✅ GOOD - Idempotent operation
export const handler = async (event: SQSEvent) => {
for (const record of event.Records) {
const orderId = JSON.parse(record.body).orderId;
// Check if already processed (idempotency)
const existing = await dynamodb.getItem({
TableName: process.env.TABLE_NAME,
Key: { orderId },
});
if (existing.Item) {
console.log('Order already processed:', orderId);
continue; // Skip duplicate
}
// Process order
await processOrder(orderId);
// Mark as processed
await dynamodb.putItem({
TableName: process.env.TABLE_NAME,
Item: { orderId, processedAt: Date.now() },
});
}
};
Implement retry logic with exponential backoff:
async function withRetry<T>(fn: () => Promise<T>, maxRetries = 3): Promise<T> {
for (let i = 0; i < maxRetries; i++) {
try {
return await fn();
} catch (error) {
if (i === maxRetries - 1) throw error;
await new Promise(resolve => setTimeout(resolve, Math.pow(2, i) * 1000));
}
}
throw new Error('Max retries exceeded');
}
Architecture Patterns
For detailed implementation patterns with full code examples, see the reference documentation:
Event-Driven Architecture Patterns
File: references/eda-patterns.md
- Event Router with EventBridge (custom event bus, schema registry, rule-based routing)
- Queue-Based Processing with SQS (standard/FIFO, DLQ, Lambda consumers)
- Pub/Sub Fan-Out with SNS + SQS (multi-consumer, filtering)
- Saga Pattern with Step Functions (distributed transactions, compensating actions)
- Event Sourcing with DynamoDB Streams (append-only event store, projections)
Serverless Architecture Patterns
File: references/serverless-patterns.md
- API-Driven Microservices (REST API + Lambda backend)
- Stream Processing with Kinesis (real-time, batch windowing, bisect on error)
- Async Task Processing with SQS (background jobs, concurrency control)
- Scheduled Jobs with EventBridge (cron/rate schedules)
- Webhook Processing (signature validation, async queue forwarding)
Important: When using CDK code examples from references, avoid hardcoding resource names (e.g.,
restApiName,eventBusName). Let CDK generate unique names automatically to enable reusability and parallel deployments. Seeaws-cdk-developmentskill for details.
Best Practices
Error Handling
Implement comprehensive error handling:
export const handler = async (event: SQSEvent) => {
const failures: SQSBatchItemFailure[] = [];
for (const record of event.Records) {
try {
await processRecord(record);
} catch (error) {
console.error('Failed to process record:', record.messageId, error);
failures.push({ itemIdentifier: record.messageId });
}
}
// Return partial batch failures for retry
return { batchItemFailures: failures };
};
Dead Letter Queues
Always configure DLQs for error handling:
const dlq = new sqs.Queue(this, 'DLQ', {
retentionPeriod: Duration.days(14),
});
const queue = new sqs.Queue(this, 'Queue', {
deadLetterQueue: {
queue: dlq,
maxReceiveCount: 3,
},
});
// Monitor DLQ depth
new cloudwatch.Alarm(this, 'DLQAlarm', {
metric: dlq.metricApproximateNumberOfMessagesVisible(),
threshold: 1,
evaluationPeriods: 1,
alarmDescription: 'Messages in DLQ require attention',
});
Observability
Enable tracing and monitoring:
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
tracing: lambda.Tracing.ACTIVE, // X-Ray tracing
environment: {
POWERTOOLS_SERVICE_NAME: 'order-service',
POWERTOOLS_METRICS_NAMESPACE: 'MyApp',
LOG_LEVEL: 'INFO',
},
});
Using MCP Servers Effectively
Use the CDK MCP server (via aws-cdk-development dependency) for construct recommendations and CDK-specific guidance when building serverless infrastructure.
Use AWS Documentation MCP to verify service features, regional availability, and API specifications before implementing.
Additional Resources
This skill includes comprehensive reference documentation based on AWS best practices:
-
Serverless Patterns:
references/serverless-patterns.md- Core serverless architectures and API patterns
- Data processing and integration patterns
- Orchestration with Step Functions
- Anti-patterns to avoid
-
Event-Driven Architecture Patterns:
references/eda-patterns.md- Event routing and processing patterns
- Event sourcing and saga patterns
- Idempotency and error handling
- Message ordering and deduplication
-
Security Best Practices:
references/security-best-practices.md- Shared responsibility model
- IAM least privilege patterns
- Data protection and encryption
- Network security with VPC
-
Observability Best Practices:
references/observability-best-practices.md- Three pillars: metrics, logs, traces
- Structured logging with Lambda Powertools
- X-Ray distributed tracing
- CloudWatch alarms and dashboards
-
Performance Optimization:
references/performance-optimization.md- Cold start optimization techniques
- Memory and CPU optimization
- Package size reduction
- Provisioned concurrency patterns
-
Deployment Best Practices:
references/deployment-best-practices.md- CI/CD pipeline design
- Testing strategies (unit, integration, load)
- Deployment strategies (canary, blue/green)
- Rollback and safety mechanisms
External Resources:
- AWS Well-Architected Serverless Lens: https://docs.aws.amazon.com/wellarchitected/latest/serverless-applications-lens/
- ServerlessLand.com: Pre-built serverless patterns
- AWS Serverless Workshops: https://serverlessland.com/learn?type=Workshops
For detailed implementation patterns, anti-patterns, and code examples, refer to the comprehensive references in the skill directory.
Serverless Deployment Best Practices
Deployment best practices for serverless applications including CI/CD, testing, and deployment strategies.
Table of Contents
- Software Release Process
- Infrastructure as Code
- CI/CD Pipeline Design
- Testing Strategies
- Deployment Strategies
- Rollback and Safety
Software Release Process
Four Stages of Release
1. Source Phase:
- Developers commit code changes
- Code review (peer review)
- Version control (Git)
2. Build Phase:
- Compile code
- Run unit tests
- Style checking and linting
- Create deployment packages
- Build container images
3. Test Phase:
- Integration tests with other systems
- Load testing
- UI testing
- Security testing (penetration testing)
- Acceptance testing
4. Production Phase:
- Deploy to production environment
- Monitor for errors
- Validate deployment success
- Rollback if needed
CI/CD Maturity Levels
Continuous Integration (CI):
- Automated build on code commit
- Automated unit testing
- Manual deployment to test/production
Continuous Delivery (CD):
- Automated deployment to test environments
- Manual approval for production
- Automated testing in non-prod
Continuous Deployment:
- Fully automated pipeline
- Automated deployment to production
- No manual intervention after code commit
Infrastructure as Code
Framework Selection
AWS SAM (Serverless Application Model):
# template.yaml
AWSTemplateFormatVersion: '2010-09-09'
Transform: AWS::Serverless-2016-10-31
Resources:
OrderFunction:
Type: AWS::Serverless::Function
Properties:
Handler: app.handler
Runtime: nodejs20.x
CodeUri: src/
Events:
Api:
Type: Api
Properties:
Path: /orders
Method: postBenefits:
- Simple, serverless-focused syntax
- Built-in best practices
- SAM CLI for local testing
- Integrates with CodeDeploy
AWS CDK:
new NodejsFunction(this, 'OrderFunction', {
entry: 'src/orders/handler.ts',
environment: {
TABLE_NAME: ordersTable.tableName,
},
});
ordersTable.grantReadWriteData(orderFunction);Benefits:
- Type-safe, programmatic
- Reusable constructs
- Rich AWS service support
- Better for complex infrastructure
When to use:
- SAM: Serverless-only applications, simpler projects
- CDK: Complex infrastructure, multiple services, reusable patterns
Environment Management
Separate environments:
// CDK App
const app = new cdk.App();
new ServerlessStack(app, 'DevStack', {
env: { account: '111111111111', region: 'us-east-1' },
environment: 'dev',
logLevel: 'DEBUG',
});
new ServerlessStack(app, 'ProdStack', {
env: { account: '222222222222', region: 'us-east-1' },
environment: 'prod',
logLevel: 'INFO',
});SAM with parameters:
Parameters:
Environment:
Type: String
Default: dev
AllowedValues:
- dev
- staging
- prod
Resources:
Function:
Type: AWS::Serverless::Function
Properties:
Environment:
Variables:
ENVIRONMENT: !Ref Environment
LOG_LEVEL: !If [IsProd, INFO, DEBUG]CI/CD Pipeline Design
AWS CodePipeline
Comprehensive pipeline:
import * as codepipeline from 'aws-cdk-lib/aws-codepipeline';
import * as codepipeline_actions from 'aws-cdk-lib/aws-codepipeline-actions';
const sourceOutput = new codepipeline.Artifact();
const buildOutput = new codepipeline.Artifact();
const pipeline = new codepipeline.Pipeline(this, 'Pipeline', {
pipelineName: 'serverless-pipeline',
});
// Source stage
pipeline.addStage({
stageName: 'Source',
actions: [
new codepipeline_actions.CodeStarConnectionsSourceAction({
actionName: 'GitHub_Source',
owner: 'myorg',
repo: 'myrepo',
branch: 'main',
output: sourceOutput,
connectionArn: githubConnection.connectionArn,
}),
],
});
// Build stage
pipeline.addStage({
stageName: 'Build',
actions: [
new codepipeline_actions.CodeBuildAction({
actionName: 'Build',
project: buildProject,
input: sourceOutput,
outputs: [buildOutput],
}),
],
});
// Test stage
pipeline.addStage({
stageName: 'Test',
actions: [
new codepipeline_actions.CloudFormationCreateUpdateStackAction({
actionName: 'Deploy_Test',
templatePath: buildOutput.atPath('packaged.yaml'),
stackName: 'test-stack',
adminPermissions: true,
}),
new codepipeline_actions.CodeBuildAction({
actionName: 'Integration_Tests',
project: testProject,
input: buildOutput,
runOrder: 2,
}),
],
});
// Production stage (with manual approval)
pipeline.addStage({
stageName: 'Production',
actions: [
new codepipeline_actions.ManualApprovalAction({
actionName: 'Approve',
}),
new codepipeline_actions.CloudFormationCreateUpdateStackAction({
actionName: 'Deploy_Prod',
templatePath: buildOutput.atPath('packaged.yaml'),
stackName: 'prod-stack',
adminPermissions: true,
runOrder: 2,
}),
],
});GitHub Actions
Serverless deployment workflow:
# .github/workflows/deploy.yml
name: Deploy Serverless Application
on:
push:
branches: [main]
jobs:
build-and-deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- name: Setup Node.js
uses: actions/setup-node@v3
with:
node-version: '20'
- name: Install dependencies
run: npm ci
- name: Run tests
run: npm test
- name: Setup SAM CLI
uses: aws-actions/setup-sam@v2
- name: Build SAM application
run: sam build
- name: Deploy to Dev
if: github.ref != 'refs/heads/main'
run: |
sam deploy \
--no-confirm-changeset \
--no-fail-on-empty-changeset \
--stack-name dev-stack \
--parameter-overrides Environment=dev
- name: Run integration tests
run: npm run test:integration
- name: Deploy to Prod
if: github.ref == 'refs/heads/main'
run: |
sam deploy \
--no-confirm-changeset \
--no-fail-on-empty-changeset \
--stack-name prod-stack \
--parameter-overrides Environment=prodTesting Strategies
Unit Testing
Test business logic independently:
// handler.ts
export const processOrder = (order: Order): ProcessedOrder => {
// Pure business logic (easily testable)
validateOrder(order);
calculateTotal(order);
return transformOrder(order);
};
export const handler = async (event: any) => {
const order = parseEvent(event);
const processed = processOrder(order); // Testable function
await saveToDatabase(processed);
return formatResponse(processed);
};
// handler.test.ts
import { processOrder } from './handler';
describe('processOrder', () => {
it('calculates total correctly', () => {
const order = {
items: [
{ price: 10, quantity: 2 },
{ price: 5, quantity: 3 },
],
};
const result = processOrder(order);
expect(result.total).toBe(35);
});
it('throws on invalid order', () => {
const invalid = { items: [] };
expect(() => processOrder(invalid)).toThrow();
});
});Integration Testing
Test in actual AWS environment:
// integration.test.ts
import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';
import { DynamoDBClient, GetItemCommand } from '@aws-sdk/client-dynamodb';
describe('Order Processing Integration', () => {
const lambda = new LambdaClient({});
const dynamodb = new DynamoDBClient({});
it('processes order end-to-end', async () => {
// Invoke Lambda
const response = await lambda.send(new InvokeCommand({
FunctionName: process.env.FUNCTION_NAME,
Payload: JSON.stringify({
orderId: 'test-123',
items: [{ productId: 'prod-1', quantity: 2 }],
}),
}));
const result = JSON.parse(Buffer.from(response.Payload!).toString());
expect(result.statusCode).toBe(200);
// Verify database write
const dbResult = await dynamodb.send(new GetItemCommand({
TableName: process.env.TABLE_NAME,
Key: { orderId: { S: 'test-123' } },
}));
expect(dbResult.Item).toBeDefined();
expect(dbResult.Item?.status.S).toBe('PROCESSED');
});
});Local Testing with SAM
Test locally before deployment:
# Start local API
sam local start-api
# Invoke function locally
sam local invoke OrderFunction -e events/create-order.json
# Generate sample events
sam local generate-event apigateway aws-proxy > event.json
# Debug locally
sam local invoke OrderFunction -d 5858
# Test with Docker
sam local start-api --docker-network my-networkLoad Testing
Test under production load:
# Install Artillery
npm install -g artillery
# Create load test
cat > load-test.yml <<EOF
config:
target: https://api.example.com
phases:
- duration: 300 # 5 minutes
arrivalRate: 50 # 50 requests/second
rampTo: 200 # Ramp to 200 req/sec
scenarios:
- flow:
- post:
url: /orders
json:
orderId: "{{ $randomString() }}"
EOF
# Run load test
artillery run load-test.yml --output report.json
# Generate HTML report
artillery report report.jsonDeployment Strategies
All-at-Once Deployment
Simple, fast, risky:
# SAM template
Resources:
OrderFunction:
Type: AWS::Serverless::Function
Properties:
DeploymentPreference:
Type: AllAtOnce # Deploy immediatelyUse for:
- Development environments
- Non-critical applications
- Quick hotfixes (with caution)
Blue/Green Deployment
Zero-downtime deployment:
Resources:
OrderFunction:
Type: AWS::Serverless::Function
Properties:
AutoPublishAlias: live
DeploymentPreference:
Type: Linear10PercentEvery1Minute
Alarms:
- !Ref ErrorAlarm
- !Ref LatencyAlarmDeployment types:
- Linear10PercentEvery1Minute: 10% traffic shift every minute
- Linear10PercentEvery2Minutes: Slower, more conservative
- Linear10PercentEvery3Minutes: Even slower
- Linear10PercentEvery10Minutes: Very gradual
- Canary10Percent5Minutes: 10% for 5 min, then 100%
- Canary10Percent10Minutes: 10% for 10 min, then 100%
- Canary10Percent30Minutes: 10% for 30 min, then 100%
Canary Deployment
Test with subset of traffic:
Resources:
OrderFunction:
Type: AWS::Serverless::Function
Properties:
AutoPublishAlias: live
DeploymentPreference:
Type: Canary10Percent10Minutes
Alarms:
- !Ref ErrorAlarm
- !Ref LatencyAlarm
Hooks:
PreTraffic: !Ref PreTrafficHook
PostTraffic: !Ref PostTrafficHook
PreTrafficHook:
Type: AWS::Serverless::Function
Properties:
Handler: hooks.pre_traffic
Runtime: python3.12
# Runs before traffic shift
# Validates new version
PostTrafficHook:
Type: AWS::Serverless::Function
Properties:
Handler: hooks.post_traffic
Runtime: python3.12
# Runs after traffic shift
# Validates deployment successCDK with CodeDeploy:
import * as codedeploy from 'aws-cdk-lib/aws-codedeploy';
const alias = fn.currentVersion.addAlias('live');
new codedeploy.LambdaDeploymentGroup(this, 'DeploymentGroup', {
alias,
deploymentConfig: codedeploy.LambdaDeploymentConfig.CANARY_10PERCENT_10MINUTES,
alarms: [errorAlarm, latencyAlarm],
autoRollback: {
failedDeployment: true,
stoppedDeployment: true,
deploymentInAlarm: true,
},
});Deployment Hooks
Pre-traffic hook (validation):
# hooks.py
import boto3
lambda_client = boto3.client('lambda')
codedeploy = boto3.client('codedeploy')
def pre_traffic(event, context):
"""
Validate new version before traffic shift
"""
function_name = event['DeploymentId']
version = event['NewVersion']
try:
# Invoke new version with test payload
response = lambda_client.invoke(
FunctionName=f"{function_name}:{version}",
InvocationType='RequestResponse',
Payload=json.dumps({'test': True})
)
# Validate response
if response['StatusCode'] == 200:
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=event['DeploymentId'],
lifecycleEventHookExecutionId=event['LifecycleEventHookExecutionId'],
status='Succeeded'
)
else:
raise Exception('Validation failed')
except Exception as e:
print(f'Pre-traffic validation failed: {e}')
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=event['DeploymentId'],
lifecycleEventHookExecutionId=event['LifecycleEventHookExecutionId'],
status='Failed'
)Post-traffic hook (verification):
def post_traffic(event, context):
"""
Verify deployment success after traffic shift
"""
try:
# Check CloudWatch metrics
cloudwatch = boto3.client('cloudwatch')
metrics = cloudwatch.get_metric_statistics(
Namespace='AWS/Lambda',
MetricName='Errors',
Dimensions=[{'Name': 'FunctionName', 'Value': function_name}],
StartTime=deployment_start_time,
EndTime=datetime.utcnow(),
Period=300,
Statistics=['Sum']
)
# Validate no errors
total_errors = sum(point['Sum'] for point in metrics['Datapoints'])
if total_errors == 0:
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=event['DeploymentId'],
lifecycleEventHookExecutionId=event['LifecycleEventHookExecutionId'],
status='Succeeded'
)
else:
raise Exception(f'{total_errors} errors detected')
except Exception as e:
print(f'Post-traffic verification failed: {e}')
codedeploy.put_lifecycle_event_hook_execution_status(
deploymentId=event['DeploymentId'],
lifecycleEventHookExecutionId=event['LifecycleEventHookExecutionId'],
status='Failed'
)Rollback and Safety
Automatic Rollback
Configure rollback triggers:
DeploymentPreference:
Type: Canary10Percent10Minutes
Alarms:
- !Ref ErrorAlarm
- !Ref LatencyAlarm
# Automatically rolls back if alarms triggerRollback scenarios:
- CloudWatch alarm triggers during deployment
- Pre-traffic hook fails
- Post-traffic hook fails
- Deployment manually stopped
CloudWatch Alarms for Deployment
Critical alarms during deployment:
// Error rate alarm
const errorAlarm = new cloudwatch.Alarm(this, 'ErrorAlarm', {
metric: fn.metricErrors({
statistic: 'Sum',
period: Duration.minutes(1),
}),
threshold: 5,
evaluationPeriods: 2,
treatMissingData: cloudwatch.TreatMissingData.NOT_BREACHING,
});
// Duration alarm (regression)
const durationAlarm = new cloudwatch.Alarm(this, 'DurationAlarm', {
metric: fn.metricDuration({
statistic: 'Average',
period: Duration.minutes(1),
}),
threshold: previousAvgDuration * 1.2, // 20% increase
evaluationPeriods: 2,
comparisonOperator: cloudwatch.ComparisonOperator.GREATER_THAN_THRESHOLD,
});
// Throttle alarm
const throttleAlarm = new cloudwatch.Alarm(this, 'ThrottleAlarm', {
metric: fn.metricThrottles({
statistic: 'Sum',
period: Duration.minutes(1),
}),
threshold: 1,
evaluationPeriods: 1,
});Version Management
Use Lambda versions and aliases:
const version = fn.currentVersion;
const prodAlias = version.addAlias('prod');
const devAlias = version.addAlias('dev');
// Gradual rollout with weighted aliases
new lambda.Alias(this, 'LiveAlias', {
aliasName: 'live',
version: newVersion,
additionalVersions: [
{ version: oldVersion, weight: 0.9 }, // 90% old
// 10% automatically goes to main version (new)
],
});Best Practices Checklist
Pre-Deployment
- Code review completed
- Unit tests passing
- Integration tests passing
- Security scan completed
- Dependencies updated
- Infrastructure validated (CDK synth, SAM validate)
- Environment variables configured
Deployment
- Use IaC (SAM, CDK, Terraform)
- Separate environments (dev, staging, prod)
- Automate deployments via CI/CD
- Use gradual deployment (canary or linear)
- Configure CloudWatch alarms
- Enable automatic rollback
- Use deployment hooks for validation
Post-Deployment
- Monitor CloudWatch metrics
- Check CloudWatch Logs for errors
- Verify X-Ray traces
- Validate business metrics
- Check alarm status
- Review deployment logs
- Document any issues
Rollback Preparation
- Keep previous version available
- Document rollback procedure
- Test rollback in non-prod
- Configure automatic rollback
- Monitor during rollback
- Communication plan for rollback
Deployment Patterns
Multi-Region Deployment
Active-Passive:
// Primary region
new ServerlessStack(app, 'PrimaryStack', {
env: { region: 'us-east-1' },
isPrimary: true,
});
// Secondary region (standby)
new ServerlessStack(app, 'SecondaryStack', {
env: { region: 'us-west-2' },
isPrimary: false,
});
// Route 53 health check and failover
const healthCheck = new route53.CfnHealthCheck(this, 'HealthCheck', {
type: 'HTTPS',
resourcePath: '/health',
fullyQualifiedDomainName: 'api.example.com',
});Active-Active:
// Deploy to multiple regions
const regions = ['us-east-1', 'us-west-2', 'eu-west-1'];
for (const region of regions) {
new ServerlessStack(app, `Stack-${region}`, {
env: { region },
});
}
// Route 53 geolocation routing
new route53.ARecord(this, 'GeoRecord', {
zone: hostedZone,
recordName: 'api',
target: route53.RecordTarget.fromAlias(
new targets.ApiGatewayDomain(domain)
),
geoLocation: route53.GeoLocation.country('US'),
});Feature Flags with AppConfig
Safe feature rollout:
import { AppConfigData } from '@aws-sdk/client-appconfigdata';
const appconfig = new AppConfigData({});
export const handler = async (event: any) => {
// Fetch feature flags
const config = await appconfig.getLatestConfiguration({
ConfigurationToken: token,
});
const features = JSON.parse(config.Configuration.toString());
if (features.newFeatureEnabled) {
return newFeatureHandler(event);
}
return legacyHandler(event);
};Summary
- IaC: Use SAM or CDK for all deployments
- Environments: Separate dev, staging, production
- CI/CD: Automate build, test, and deployment
- Testing: Unit, integration, and load testing
- Gradual Deployment: Use canary or linear for production
- Alarms: Configure and monitor during deployment
- Rollback: Enable automatic rollback on failures
- Hooks: Validate before and after traffic shifts
- Versioning: Use Lambda versions and aliases
- Multi-Region: Plan for disaster recovery
Event-Driven Architecture Patterns
Comprehensive patterns for building event-driven systems on AWS with serverless technologies.
Table of Contents
- Core EDA Concepts
- Event Routing Patterns
- Event Processing Patterns
- Event Sourcing Patterns
- Saga Patterns
- Best Practices
Core EDA Concepts
Event Types
Domain Events: Represent business facts
{
"source": "orders",
"detailType": "OrderPlaced",
"detail": {
"orderId": "12345",
"customerId": "customer-1",
"amount": 100.00,
"timestamp": "2025-01-15T10:30:00Z"
}
}System Events: Technical occurrences
{
"source": "aws.s3",
"detailType": "Object Created",
"detail": {
"bucket": "my-bucket",
"key": "data/file.json"
}
}Event Contracts
Define clear contracts between producers and consumers:
// schemas/order-events.ts
export interface OrderPlacedEvent {
orderId: string;
customerId: string;
items: Array<{
productId: string;
quantity: number;
price: number;
}>;
totalAmount: number;
timestamp: string;
}
// Register schema with EventBridge
const registry = new events.EventBusSchemaRegistry(this, 'SchemaRegistry');
const schema = new events.Schema(this, 'OrderPlacedSchema', {
schemaName: 'OrderPlaced',
definition: events.SchemaDefinition.fromInline(/* JSON Schema */),
});Event Routing Patterns
Pattern 1: Content-Based Routing
Route events based on content:
// Route by order amount
new events.Rule(this, 'HighValueOrders', {
eventPattern: {
source: ['orders'],
detailType: ['OrderPlaced'],
detail: {
totalAmount: [{ numeric: ['>', 1000] }],
},
},
targets: [new targets.LambdaFunction(highValueOrderFunction)],
});
new events.Rule(this, 'StandardOrders', {
eventPattern: {
source: ['orders'],
detailType: ['OrderPlaced'],
detail: {
totalAmount: [{ numeric: ['<=', 1000] }],
},
},
targets: [new targets.LambdaFunction(standardOrderFunction)],
});Pattern 2: Event Filtering
Filter events before processing:
// Filter by multiple criteria
new events.Rule(this, 'FilteredRule', {
eventPattern: {
source: ['inventory'],
detailType: ['StockUpdate'],
detail: {
warehouseId: ['WH-1', 'WH-2'], // Specific warehouses
quantity: [{ numeric: ['<', 10] }], // Low stock only
productCategory: ['electronics'], // Specific category
},
},
targets: [new targets.LambdaFunction(reorderFunction)],
});Pattern 3: Event Replay and Archive
Store events for replay and audit:
// Archive all events
const archive = new events.Archive(this, 'EventArchive', {
eventPattern: {
account: [this.account],
},
retention: Duration.days(365),
});
// Replay events when needed
// Use AWS Console or CLI to replay from archivePattern 4: Cross-Account Event Routing
Route events to other AWS accounts:
// Event bus in Account A
const eventBus = new events.EventBus(this, 'SharedBus');
// Grant permission to Account B
eventBus.addToResourcePolicy(new iam.PolicyStatement({
effect: iam.Effect.ALLOW,
principals: [new iam.AccountPrincipal('ACCOUNT-B-ID')],
actions: ['events:PutEvents'],
resources: [eventBus.eventBusArn],
}));
// Rule forwards to Account B event bus
new events.Rule(this, 'ForwardToAccountB', {
eventBus,
eventPattern: {
source: ['shared-service'],
},
targets: [new targets.EventBus(
events.EventBus.fromEventBusArn(
this,
'AccountBBus',
'arn:aws:events:us-east-1:ACCOUNT-B-ID:event-bus/default'
)
)],
});Event Processing Patterns
Pattern 1: Event Transformation
Transform events before routing:
// EventBridge input transformer
new events.Rule(this, 'TransformRule', {
eventPattern: {
source: ['orders'],
},
targets: [new targets.LambdaFunction(processFunction, {
event: events.RuleTargetInput.fromObject({
orderId: events.EventField.fromPath('$.detail.orderId'),
customerEmail: events.EventField.fromPath('$.detail.customer.email'),
amount: events.EventField.fromPath('$.detail.totalAmount'),
// Transformed structure
}),
})],
});Pattern 2: Event Aggregation
Aggregate multiple events:
// DynamoDB stores partial results
export const handler = async (event: any) => {
const { transactionId, step, data } = event;
// Store step result
await dynamodb.updateItem({
TableName: process.env.TABLE_NAME,
Key: { transactionId },
UpdateExpression: 'SET #step = :data',
ExpressionAttributeNames: { '#step': step },
ExpressionAttributeValues: { ':data': data },
});
// Check if all steps complete
const item = await dynamodb.getItem({
TableName: process.env.TABLE_NAME,
Key: { transactionId },
});
if (allStepsComplete(item)) {
// Trigger final processing
await eventBridge.putEvents({
Entries: [{
Source: 'aggregator',
DetailType: 'AllStepsComplete',
Detail: JSON.stringify(item),
}],
});
}
};Pattern 3: Event Enrichment
Enrich events with additional data:
export const enrichEvent = async (event: any) => {
const { customerId } = event.detail;
// Fetch additional customer data
const customer = await dynamodb.getItem({
TableName: process.env.CUSTOMER_TABLE,
Key: { customerId },
});
// Publish enriched event
await eventBridge.putEvents({
Entries: [{
Source: 'orders',
DetailType: 'OrderEnriched',
Detail: JSON.stringify({
...event.detail,
customerName: customer.Item?.name,
customerTier: customer.Item?.tier,
customerEmail: customer.Item?.email,
}),
}],
});
};Pattern 4: Event Fork and Join
Process event multiple ways then aggregate:
// Step Functions parallel + aggregation
const parallel = new stepfunctions.Parallel(this, 'ForkProcessing');
parallel.branch(new tasks.LambdaInvoke(this, 'ValidateInventory', {
lambdaFunction: inventoryFunction,
resultPath: '$.inventory',
}));
parallel.branch(new tasks.LambdaInvoke(this, 'CheckCredit', {
lambdaFunction: creditFunction,
resultPath: '$.credit',
}));
parallel.branch(new tasks.LambdaInvoke(this, 'CalculateShipping', {
lambdaFunction: shippingFunction,
resultPath: '$.shipping',
}));
const definition = parallel.next(
new tasks.LambdaInvoke(this, 'AggregateResults', {
lambdaFunction: aggregateFunction,
})
);Event Sourcing Patterns
Pattern: Event Store with DynamoDB
Store all events as source of truth:
const eventStore = new dynamodb.Table(this, 'EventStore', {
partitionKey: { name: 'aggregateId', type: dynamodb.AttributeType.STRING },
sortKey: { name: 'version', type: dynamodb.AttributeType.NUMBER },
stream: dynamodb.StreamViewType.NEW_IMAGE,
pointInTimeRecovery: true, // Important for audit
});
// Append events
export const appendEvent = async (aggregateId: string, event: any) => {
const version = await getNextVersion(aggregateId);
await dynamodb.putItem({
TableName: process.env.EVENT_STORE,
Item: {
aggregateId,
version,
eventType: event.type,
eventData: event.data,
timestamp: Date.now(),
userId: event.userId,
},
ConditionExpression: 'attribute_not_exists(version)', // Optimistic locking
});
};
// Rebuild state from events
export const rebuildState = async (aggregateId: string) => {
const events = await dynamodb.query({
TableName: process.env.EVENT_STORE,
KeyConditionExpression: 'aggregateId = :id',
ExpressionAttributeValues: { ':id': aggregateId },
ScanIndexForward: true, // Chronological order
});
let state = initialState();
for (const event of events.Items) {
state = applyEvent(state, event);
}
return state;
};Pattern: Materialized Views
Create read-optimized projections:
// Event store stream triggers projection
eventStore.grantStreamRead(projectionFunction);
new lambda.EventSourceMapping(this, 'Projection', {
target: projectionFunction,
eventSourceArn: eventStore.tableStreamArn,
startingPosition: lambda.StartingPosition.LATEST,
});
// Projection function updates read model
export const updateProjection = async (event: DynamoDBStreamEvent) => {
for (const record of event.Records) {
if (record.eventName !== 'INSERT') continue;
const eventData = record.dynamodb?.NewImage;
const aggregateId = eventData?.aggregateId.S;
// Rebuild current state
const currentState = await rebuildState(aggregateId);
// Update read model
await readModelTable.putItem({
TableName: process.env.READ_MODEL_TABLE,
Item: currentState,
});
}
};Pattern: Snapshots
Optimize event replay with snapshots:
export const createSnapshot = async (aggregateId: string) => {
// Rebuild state from all events
const state = await rebuildState(aggregateId);
const version = await getLatestVersion(aggregateId);
// Store snapshot
await snapshotTable.putItem({
TableName: process.env.SNAPSHOT_TABLE,
Item: {
aggregateId,
version,
state: JSON.stringify(state),
createdAt: Date.now(),
},
});
};
// Rebuild from snapshot + newer events
export const rebuildFromSnapshot = async (aggregateId: string) => {
// Get latest snapshot
const snapshot = await getLatestSnapshot(aggregateId);
let state = JSON.parse(snapshot.state);
const snapshotVersion = snapshot.version;
// Apply only events after snapshot
const events = await getEventsSinceVersion(aggregateId, snapshotVersion);
for (const event of events) {
state = applyEvent(state, event);
}
return state;
};Saga Patterns
Pattern: Choreography-Based Saga
Services coordinate through events:
// Order Service publishes event
export const placeOrder = async (order: Order) => {
await saveOrder(order);
await eventBridge.putEvents({
Entries: [{
Source: 'orders',
DetailType: 'OrderPlaced',
Detail: JSON.stringify({ orderId: order.id }),
}],
});
};
// Inventory Service reacts to event
new events.Rule(this, 'ReserveInventory', {
eventPattern: {
source: ['orders'],
detailType: ['OrderPlaced'],
},
targets: [new targets.LambdaFunction(reserveInventoryFunction)],
});
// Inventory Service publishes result
export const reserveInventory = async (event: any) => {
const { orderId } = event.detail;
try {
await reserve(orderId);
await eventBridge.putEvents({
Entries: [{
Source: 'inventory',
DetailType: 'InventoryReserved',
Detail: JSON.stringify({ orderId }),
}],
});
} catch (error) {
await eventBridge.putEvents({
Entries: [{
Source: 'inventory',
DetailType: 'InventoryReservationFailed',
Detail: JSON.stringify({ orderId, error: error.message }),
}],
});
}
};
// Payment Service reacts to inventory event
new events.Rule(this, 'ProcessPayment', {
eventPattern: {
source: ['inventory'],
detailType: ['InventoryReserved'],
},
targets: [new targets.LambdaFunction(processPaymentFunction)],
});Pattern: Orchestration-Based Saga
Central coordinator manages saga:
// Step Functions orchestrates saga
const definition = new tasks.LambdaInvoke(this, 'ReserveInventory', {
lambdaFunction: reserveInventoryFunction,
resultPath: '$.inventory',
})
.next(new tasks.LambdaInvoke(this, 'ProcessPayment', {
lambdaFunction: processPaymentFunction,
resultPath: '$.payment',
}))
.next(new tasks.LambdaInvoke(this, 'ShipOrder', {
lambdaFunction: shipOrderFunction,
resultPath: '$.shipment',
}))
.addCatch(
// Compensation flow
new tasks.LambdaInvoke(this, 'RefundPayment', {
lambdaFunction: refundFunction,
})
.next(new tasks.LambdaInvoke(this, 'ReleaseInventory', {
lambdaFunction: releaseFunction,
})),
{
errors: ['States.TaskFailed'],
resultPath: '$.error',
}
);
new stepfunctions.StateMachine(this, 'OrderSaga', {
definition,
tracingEnabled: true,
});Comparison:
| Aspect | Choreography | Orchestration |
|---|---|---|
| Coordination | Decentralized | Centralized |
| Coupling | Loose | Tighter |
| Visibility | Distributed logs | Single execution history |
| Debugging | Harder (trace across services) | Easier (single workflow) |
| Best for | Simple flows | Complex flows |
Best Practices
Idempotency
Always make event handlers idempotent:
// Use idempotency keys
export const handler = async (event: any) => {
const idempotencyKey = event.requestId || event.messageId;
// Check if already processed
try {
const existing = await dynamodb.getItem({
TableName: process.env.IDEMPOTENCY_TABLE,
Key: { idempotencyKey },
});
if (existing.Item) {
console.log('Already processed:', idempotencyKey);
return existing.Item.result; // Return cached result
}
} catch (error) {
// First time processing
}
// Process event
const result = await processEvent(event);
// Store result
await dynamodb.putItem({
TableName: process.env.IDEMPOTENCY_TABLE,
Item: {
idempotencyKey,
result,
processedAt: Date.now(),
},
// Optional: Set TTL for cleanup
ExpirationTime: Math.floor(Date.now() / 1000) + 86400, // 24 hours
});
return result;
};Event Versioning
Handle event schema evolution:
// Version events
interface OrderPlacedEventV1 {
version: '1.0';
orderId: string;
amount: number;
}
interface OrderPlacedEventV2 {
version: '2.0';
orderId: string;
amount: number;
currency: string; // New field
}
// Handler supports multiple versions
export const handler = async (event: any) => {
const eventVersion = event.detail.version || '1.0';
switch (eventVersion) {
case '1.0':
return processV1(event.detail as OrderPlacedEventV1);
case '2.0':
return processV2(event.detail as OrderPlacedEventV2);
default:
throw new Error(`Unsupported event version: ${eventVersion}`);
}
};
const processV1 = async (event: OrderPlacedEventV1) => {
// Upgrade to V2 internally
const v2Event: OrderPlacedEventV2 = {
...event,
version: '2.0',
currency: 'USD', // Default value
};
return processV2(v2Event);
};Eventual Consistency
Design for eventual consistency:
// Service A writes to its database
export const createOrder = async (order: Order) => {
// Write to Order database
await orderTable.putItem({ Item: order });
// Publish event
await eventBridge.putEvents({
Entries: [{
Source: 'orders',
DetailType: 'OrderCreated',
Detail: JSON.stringify({ orderId: order.id }),
}],
});
};
// Service B eventually updates its database
export const onOrderCreated = async (event: any) => {
const { orderId } = event.detail;
// Fetch additional data
const orderDetails = await getOrderDetails(orderId);
// Update inventory database (eventual consistency)
await inventoryTable.updateItem({
Key: { productId: orderDetails.productId },
UpdateExpression: 'SET reserved = reserved + :qty',
ExpressionAttributeValues: { ':qty': orderDetails.quantity },
});
};Error Handling in EDA
Comprehensive error handling strategy:
// Dead Letter Queue for failed events
const dlq = new sqs.Queue(this, 'EventDLQ', {
retentionPeriod: Duration.days(14),
});
// EventBridge rule with DLQ
new events.Rule(this, 'ProcessRule', {
eventPattern: { /* ... */ },
targets: [
new targets.LambdaFunction(processFunction, {
deadLetterQueue: dlq,
maxEventAge: Duration.hours(2),
retryAttempts: 2,
}),
],
});
// Monitor DLQ
new cloudwatch.Alarm(this, 'DLQAlarm', {
metric: dlq.metricApproximateNumberOfMessagesVisible(),
threshold: 1,
evaluationPeriods: 1,
});
// DLQ processor for manual review
new lambda.EventSourceMapping(this, 'DLQProcessor', {
target: dlqProcessorFunction,
eventSourceArn: dlq.queueArn,
enabled: false, // Enable manually when reviewing
});Message Ordering
When order matters:
// SQS FIFO for strict ordering
const fifoQueue = new sqs.Queue(this, 'OrderedQueue', {
fifo: true,
contentBasedDeduplication: true,
deduplicationScope: sqs.DeduplicationScope.MESSAGE_GROUP,
fifoThroughputLimit: sqs.FifoThroughputLimit.PER_MESSAGE_GROUP_ID,
});
// Publish with message group ID
await sqs.sendMessage({
QueueUrl: process.env.QUEUE_URL,
MessageBody: JSON.stringify(event),
MessageGroupId: customerId, // All messages for same customer in order
MessageDeduplicationId: eventId, // Prevent duplicates
});
// Kinesis for ordered streams
const stream = new kinesis.Stream(this, 'Stream', {
shardCount: 1, // Single shard = strict ordering
});
// Partition key ensures same partition
await kinesis.putRecord({
StreamName: process.env.STREAM_NAME,
Data: Buffer.from(JSON.stringify(event)),
PartitionKey: customerId, // Same key = same shard
});Deduplication
Prevent duplicate event processing:
// Content-based deduplication with SQS FIFO
const queue = new sqs.Queue(this, 'Queue', {
fifo: true,
contentBasedDeduplication: true, // Hash of message body
});
// Manual deduplication with DynamoDB
export const handler = async (event: any) => {
const eventId = event.id || event.messageId;
try {
// Conditional write (fails if exists)
await dynamodb.putItem({
TableName: process.env.DEDUP_TABLE,
Item: {
eventId,
processedAt: Date.now(),
ttl: Math.floor(Date.now() / 1000) + 86400, // 24h TTL
},
ConditionExpression: 'attribute_not_exists(eventId)',
});
// Event is unique, process it
await processEvent(event);
} catch (error) {
if (error.code === 'ConditionalCheckFailedException') {
console.log('Duplicate event ignored:', eventId);
return; // Already processed
}
throw error; // Other error
}
};Backpressure Handling
Prevent overwhelming downstream systems:
// Control Lambda concurrency
const consumerFunction = new lambda.Function(this, 'Consumer', {
reservedConcurrentExecutions: 10, // Max 10 concurrent
});
// SQS visibility timeout + retry logic
const queue = new sqs.Queue(this, 'Queue', {
visibilityTimeout: Duration.seconds(300), // 5 minutes
receiveMessageWaitTime: Duration.seconds(20), // Long polling
});
new lambda.EventSourceMapping(this, 'Consumer', {
target: consumerFunction,
eventSourceArn: queue.queueArn,
batchSize: 10,
maxConcurrency: 5, // Process 5 batches concurrently
reportBatchItemFailures: true,
});
// Circuit breaker pattern
let consecutiveFailures = 0;
const FAILURE_THRESHOLD = 5;
export const handler = async (event: any) => {
// Check circuit breaker
if (consecutiveFailures >= FAILURE_THRESHOLD) {
console.error('Circuit breaker open, skipping processing');
throw new Error('Circuit breaker open');
}
try {
await processEvent(event);
consecutiveFailures = 0; // Reset on success
} catch (error) {
consecutiveFailures++;
throw error;
}
};Advanced Patterns
Pattern: Event Replay
Replay events for recovery or testing:
// Archive events for replay
const archive = new events.Archive(this, 'Archive', {
sourceEventBus: eventBus,
eventPattern: {
account: [this.account],
},
retention: Duration.days(365),
});
// Replay programmatically
export const replayEvents = async (startTime: Date, endTime: Date) => {
// Use AWS SDK to start replay
await eventBridge.startReplay({
ReplayName: `replay-${Date.now()}`,
EventSourceArn: archive.archiveArn,
EventStartTime: startTime,
EventEndTime: endTime,
Destination: {
Arn: eventBus.eventBusArn,
},
});
};Pattern: Event Time vs Processing Time
Handle late-arriving events:
// Include event timestamp
interface Event {
eventId: string;
eventTime: string; // When event occurred
processingTime?: string; // When event processed
data: any;
}
// Windowed aggregation
export const aggregateWindow = async (events: Event[]) => {
// Group by event time window (not processing time)
const windows = new Map<string, Event[]>();
for (const event of events) {
const window = getWindowForTime(new Date(event.eventTime), Duration.minutes(5));
const key = window.toISOString();
if (!windows.has(key)) {
windows.set(key, []);
}
windows.get(key)!.push(event);
}
// Process each window
for (const [window, eventsInWindow] of windows) {
await processWindow(window, eventsInWindow);
}
};Pattern: Transactional Outbox
Ensure event publishing with database writes:
// Single DynamoDB transaction
export const createOrderWithEvent = async (order: Order) => {
await dynamodb.transactWriteItems({
TransactItems: [
{
// Write order
Put: {
TableName: process.env.ORDERS_TABLE,
Item: marshall(order),
},
},
{
// Write outbox event
Put: {
TableName: process.env.OUTBOX_TABLE,
Item: marshall({
eventId: uuid(),
eventType: 'OrderPlaced',
eventData: order,
status: 'PENDING',
createdAt: Date.now(),
}),
},
},
],
});
};
// Separate Lambda processes outbox
new lambda.EventSourceMapping(this, 'OutboxProcessor', {
target: outboxFunction,
eventSourceArn: outboxTable.tableStreamArn,
startingPosition: lambda.StartingPosition.LATEST,
});
export const processOutbox = async (event: DynamoDBStreamEvent) => {
for (const record of event.Records) {
if (record.eventName !== 'INSERT') continue;
const outboxEvent = unmarshall(record.dynamodb?.NewImage);
// Publish to EventBridge
await eventBridge.putEvents({
Entries: [{
Source: 'orders',
DetailType: outboxEvent.eventType,
Detail: JSON.stringify(outboxEvent.eventData),
}],
});
// Mark as processed
await dynamodb.updateItem({
TableName: process.env.OUTBOX_TABLE,
Key: { eventId: outboxEvent.eventId },
UpdateExpression: 'SET #status = :status',
ExpressionAttributeNames: { '#status': 'status' },
ExpressionAttributeValues: { ':status': 'PUBLISHED' },
});
}
};Testing Event-Driven Systems
Pattern: Event Replay for Testing
// Publish test events
export const publishTestEvents = async () => {
const testEvents = [
{ source: 'orders', detailType: 'OrderPlaced', detail: { orderId: '1' } },
{ source: 'orders', detailType: 'OrderPlaced', detail: { orderId: '2' } },
];
for (const event of testEvents) {
await eventBridge.putEvents({ Entries: [event] });
}
};
// Monitor processing
export const verifyProcessing = async () => {
// Check downstream databases
const order1 = await orderTable.getItem({ Key: { orderId: '1' } });
const order2 = await orderTable.getItem({ Key: { orderId: '2' } });
expect(order1.Item).toBeDefined();
expect(order2.Item).toBeDefined();
};Pattern: Event Mocking
// Mock EventBridge in tests
const mockEventBridge = {
putEvents: jest.fn().mockResolvedValue({}),
};
// Test event publishing
test('publishes event on order creation', async () => {
await createOrder(mockEventBridge, order);
expect(mockEventBridge.putEvents).toHaveBeenCalledWith({
Entries: [
expect.objectContaining({
Source: 'orders',
DetailType: 'OrderPlaced',
}),
],
});
});Summary
- Loose Coupling: Services communicate via events, not direct calls
- Async Processing: Use queues and event buses for asynchronous workflows
- Idempotency: Always handle duplicate events gracefully
- Dead Letter Queues: Configure DLQs for error handling
- Event Contracts: Define clear schemas for events
- Observability: Enable tracing and monitoring across services
- Eventual Consistency: Design for it, don't fight it
- Saga Patterns: Use for distributed transactions
- Event Sourcing: Store events as source of truth when needed
Serverless Observability Best Practices
Comprehensive observability patterns for serverless applications based on AWS best practices.
Table of Contents
Three Pillars of Observability
Metrics
Numeric data measured at intervals (time series)
- Request rate, error rate, duration
- CPU%, memory%, disk%
- Custom business metrics
- Service Level Indicators (SLIs)
Logs
Timestamped records of discrete events
- Application events and errors
- State transformations
- Debugging information
- Audit trails
Traces
Single user's journey across services
- Request flow through distributed system
- Service dependencies
- Latency breakdown
- Error propagation
Metrics
CloudWatch Metrics for Lambda
Out-of-the-box metrics (automatically available):
- Invocations
- Errors
- Throttles
- Duration
- ConcurrentExecutions
- IteratorAge (for streams)CDK Configuration:
const fn = new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
});
// Create alarms on metrics
new cloudwatch.Alarm(this, 'ErrorAlarm', {
metric: fn.metricErrors({
statistic: 'Sum',
period: Duration.minutes(5),
}),
threshold: 10,
evaluationPeriods: 1,
});
new cloudwatch.Alarm(this, 'DurationAlarm', {
metric: fn.metricDuration({
statistic: 'p99',
period: Duration.minutes(5),
}),
threshold: 1000, // 1 second
evaluationPeriods: 2,
});Custom Metrics
Use CloudWatch Embedded Metric Format (EMF):
export const handler = async (event: any) => {
const startTime = Date.now();
try {
const result = await processOrder(event);
// Emit custom metrics
console.log(JSON.stringify({
_aws: {
Timestamp: Date.now(),
CloudWatchMetrics: [{
Namespace: 'MyApp/Orders',
Dimensions: [['ServiceName', 'Operation']],
Metrics: [
{ Name: 'ProcessingTime', Unit: 'Milliseconds' },
{ Name: 'OrderValue', Unit: 'None' },
],
}],
},
ServiceName: 'OrderService',
Operation: 'ProcessOrder',
ProcessingTime: Date.now() - startTime,
OrderValue: result.amount,
}));
return result;
} catch (error) {
// Emit error metric
console.log(JSON.stringify({
_aws: {
CloudWatchMetrics: [{
Namespace: 'MyApp/Orders',
Dimensions: [['ServiceName']],
Metrics: [{ Name: 'Errors', Unit: 'Count' }],
}],
},
ServiceName: 'OrderService',
Errors: 1,
}));
throw error;
}
};Using Lambda Powertools:
import { Metrics, MetricUnits } from '@aws-lambda-powertools/metrics';
const metrics = new Metrics({
namespace: 'MyApp',
serviceName: 'OrderService',
});
export const handler = async (event: any) => {
metrics.addMetric('Invocation', MetricUnits.Count, 1);
const startTime = Date.now();
try {
const result = await processOrder(event);
metrics.addMetric('Success', MetricUnits.Count, 1);
metrics.addMetric('ProcessingTime', MetricUnits.Milliseconds, Date.now() - startTime);
metrics.addMetric('OrderValue', MetricUnits.None, result.amount);
return result;
} catch (error) {
metrics.addMetric('Error', MetricUnits.Count, 1);
throw error;
} finally {
metrics.publishStoredMetrics();
}
};Logging
Structured Logging
Use JSON format for logs:
// ✅ GOOD - Structured JSON logging
export const handler = async (event: any) => {
console.log(JSON.stringify({
level: 'INFO',
message: 'Processing order',
orderId: event.orderId,
customerId: event.customerId,
timestamp: new Date().toISOString(),
requestId: context.requestId,
}));
try {
const result = await processOrder(event);
console.log(JSON.stringify({
level: 'INFO',
message: 'Order processed successfully',
orderId: event.orderId,
duration: Date.now() - startTime,
timestamp: new Date().toISOString(),
}));
return result;
} catch (error) {
console.error(JSON.stringify({
level: 'ERROR',
message: 'Order processing failed',
orderId: event.orderId,
error: {
name: error.name,
message: error.message,
stack: error.stack,
},
timestamp: new Date().toISOString(),
}));
throw error;
}
};
// ❌ BAD - Unstructured logging
console.log('Processing order ' + orderId + ' for customer ' + customerId);Using Lambda Powertools Logger:
import { Logger } from '@aws-lambda-powertools/logger';
const logger = new Logger({
serviceName: 'OrderService',
logLevel: 'INFO',
});
export const handler = async (event: any, context: Context) => {
logger.addContext(context);
logger.info('Processing order', {
orderId: event.orderId,
customerId: event.customerId,
});
try {
const result = await processOrder(event);
logger.info('Order processed', {
orderId: event.orderId,
amount: result.amount,
});
return result;
} catch (error) {
logger.error('Order processing failed', {
orderId: event.orderId,
error,
});
throw error;
}
};Log Levels
Use appropriate log levels:
- ERROR: Errors requiring immediate attention
- WARN: Warnings or recoverable errors
- INFO: Important business events
- DEBUG: Detailed debugging information (disable in production)
const logger = new Logger({
serviceName: 'OrderService',
logLevel: process.env.LOG_LEVEL || 'INFO',
});
logger.debug('Detailed processing info', { data });
logger.info('Business event occurred', { event });
logger.warn('Recoverable error', { error });
logger.error('Critical failure', { error });Log Insights Queries
Common CloudWatch Logs Insights queries:
# Find errors in last hour
fields @timestamp, @message, level, error.message
| filter level = "ERROR"
| sort @timestamp desc
| limit 100
# Count errors by type
stats count() by error.name as ErrorType
| sort count desc
# Calculate p99 latency
stats percentile(duration, 99) by serviceName
# Find slow requests
fields @timestamp, orderId, duration
| filter duration > 1000
| sort duration desc
| limit 50
# Track specific customer requests
fields @timestamp, @message, orderId
| filter customerId = "customer-123"
| sort @timestamp descTracing
Enable X-Ray Tracing
Configure X-Ray for Lambda:
const fn = new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
tracing: lambda.Tracing.ACTIVE, // Enable X-Ray
});
// API Gateway tracing
const api = new apigateway.RestApi(this, 'Api', {
deployOptions: {
tracingEnabled: true,
},
});
// Step Functions tracing
new stepfunctions.StateMachine(this, 'StateMachine', {
definition,
tracingEnabled: true,
});Instrument application code:
import { captureAWSv3Client } from 'aws-xray-sdk-core';
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
// Wrap AWS SDK clients
const client = captureAWSv3Client(new DynamoDBClient({}));
// Custom segments
import AWSXRay from 'aws-xray-sdk-core';
export const handler = async (event: any) => {
const segment = AWSXRay.getSegment();
// Custom subsegment
const subsegment = segment.addNewSubsegment('ProcessOrder');
try {
// Add annotations (indexed for filtering)
subsegment.addAnnotation('orderId', event.orderId);
subsegment.addAnnotation('customerId', event.customerId);
// Add metadata (not indexed, detailed info)
subsegment.addMetadata('orderDetails', event);
const result = await processOrder(event);
subsegment.addAnnotation('status', 'success');
subsegment.close();
return result;
} catch (error) {
subsegment.addError(error);
subsegment.close();
throw error;
}
};Using Lambda Powertools Tracer:
import { Tracer } from '@aws-lambda-powertools/tracer';
const tracer = new Tracer({ serviceName: 'OrderService' });
export const handler = async (event: any) => {
const segment = tracer.getSegment();
// Automatically captures and traces
const result = await tracer.captureAWSv3Client(dynamodb).getItem({
TableName: process.env.TABLE_NAME,
Key: { orderId: event.orderId },
});
// Custom annotation
tracer.putAnnotation('orderId', event.orderId);
tracer.putMetadata('orderDetails', event);
return result;
};Service Map
Visualize service dependencies with X-Ray:
- Shows service-to-service communication
- Identifies latency bottlenecks
- Highlights error rates between services
- Tracks downstream dependencies
Distributed Tracing Best Practices
- Enable tracing everywhere: Lambda, API Gateway, Step Functions
- Use annotations for filtering: Indexed fields for queries
- Use metadata for details: Non-indexed detailed information
- Sample appropriately: 100% for low traffic, sampled for high traffic
- Correlate with logs: Include trace ID in log entries
Unified Observability
Correlation Between Pillars
Include trace ID in logs:
export const handler = async (event: any, context: Context) => {
const traceId = process.env._X_AMZN_TRACE_ID;
console.log(JSON.stringify({
level: 'INFO',
message: 'Processing order',
traceId,
requestId: context.requestId,
orderId: event.orderId,
}));
};CloudWatch ServiceLens
Unified view of traces and metrics:
- Automatically correlates X-Ray traces with CloudWatch metrics
- Shows service map with metrics overlay
- Identifies performance and availability issues
- Provides end-to-end request view
Lambda Powertools Integration
All three pillars in one:
import { Logger } from '@aws-lambda-powertools/logger';
import { Tracer } from '@aws-lambda-powertools/tracer';
import { Metrics, MetricUnits } from '@aws-lambda-powertools/metrics';
const logger = new Logger({ serviceName: 'OrderService' });
const tracer = new Tracer({ serviceName: 'OrderService' });
const metrics = new Metrics({ namespace: 'MyApp', serviceName: 'OrderService' });
export const handler = async (event: any, context: Context) => {
// Automatically adds trace context to logs
logger.addContext(context);
logger.info('Processing order', { orderId: event.orderId });
// Add trace annotations
tracer.putAnnotation('orderId', event.orderId);
// Add metrics
metrics.addMetric('Invocation', MetricUnits.Count, 1);
const startTime = Date.now();
try {
const result = await processOrder(event);
metrics.addMetric('Success', MetricUnits.Count, 1);
metrics.addMetric('Duration', MetricUnits.Milliseconds, Date.now() - startTime);
logger.info('Order processed', { orderId: event.orderId });
return result;
} catch (error) {
metrics.addMetric('Error', MetricUnits.Count, 1);
logger.error('Processing failed', { orderId: event.orderId, error });
throw error;
} finally {
metrics.publishStoredMetrics();
}
};Alerting
Effective Alerting Strategy
Alert on what matters:
- Critical: Customer-impacting issues (errors, high latency)
- Warning: Approaching thresholds (80% capacity)
- Info: Trends and anomalies (cost spikes)
Alarm fatigue prevention:
- Tune thresholds based on actual patterns
- Use composite alarms to reduce noise
- Set appropriate evaluation periods
- Include clear remediation steps
CloudWatch Alarms
Common alarm patterns:
// Error rate alarm
new cloudwatch.Alarm(this, 'ErrorRateAlarm', {
metric: new cloudwatch.MathExpression({
expression: 'errors / invocations * 100',
usingMetrics: {
errors: fn.metricErrors({ statistic: 'Sum' }),
invocations: fn.metricInvocations({ statistic: 'Sum' }),
},
}),
threshold: 1, // 1% error rate
evaluationPeriods: 2,
alarmDescription: 'Error rate exceeded 1%',
});
// Latency alarm (p99)
new cloudwatch.Alarm(this, 'LatencyAlarm', {
metric: fn.metricDuration({
statistic: 'p99',
period: Duration.minutes(5),
}),
threshold: 1000, // 1 second
evaluationPeriods: 2,
alarmDescription: 'p99 latency exceeded 1 second',
});
// Concurrent executions approaching limit
new cloudwatch.Alarm(this, 'ConcurrencyAlarm', {
metric: fn.metricConcurrentExecutions({
statistic: 'Maximum',
}),
threshold: 800, // 80% of 1000 default limit
evaluationPeriods: 1,
alarmDescription: 'Approaching concurrency limit',
});Composite Alarms
Reduce alert noise:
const errorAlarm = new cloudwatch.Alarm(this, 'Errors', {
metric: fn.metricErrors(),
threshold: 10,
evaluationPeriods: 1,
});
const throttleAlarm = new cloudwatch.Alarm(this, 'Throttles', {
metric: fn.metricThrottles(),
threshold: 5,
evaluationPeriods: 1,
});
const latencyAlarm = new cloudwatch.Alarm(this, 'Latency', {
metric: fn.metricDuration({ statistic: 'p99' }),
threshold: 2000,
evaluationPeriods: 2,
});
// Composite alarm (any of the above)
new cloudwatch.CompositeAlarm(this, 'ServiceHealthAlarm', {
compositeAlarmName: 'order-service-health',
alarmRule: cloudwatch.AlarmRule.anyOf(
errorAlarm,
throttleAlarm,
latencyAlarm
),
alarmDescription: 'Overall service health degraded',
});Dashboard Best Practices
Service Dashboard Layout
Recommended sections:
Overview:
- Total invocations
- Error rate percentage
- P50, P95, P99 latency
- Availability percentage
Resource Utilization:
- Concurrent executions
- Memory utilization
- Duration distribution
- Throttles
Business Metrics:
- Orders processed
- Revenue per minute
- Customer activity
- Feature usage
Errors and Alerts:
- Error count by type
- Active alarms
- DLQ message count
- Failed transactions
CloudWatch Dashboard CDK
const dashboard = new cloudwatch.Dashboard(this, 'ServiceDashboard', {
dashboardName: 'order-service',
});
dashboard.addWidgets(
// Row 1: Overview
new cloudwatch.GraphWidget({
title: 'Invocations',
left: [fn.metricInvocations()],
}),
new cloudwatch.SingleValueWidget({
title: 'Error Rate',
metrics: [
new cloudwatch.MathExpression({
expression: 'errors / invocations * 100',
usingMetrics: {
errors: fn.metricErrors({ statistic: 'Sum' }),
invocations: fn.metricInvocations({ statistic: 'Sum' }),
},
}),
],
}),
new cloudwatch.GraphWidget({
title: 'Latency (p50, p95, p99)',
left: [
fn.metricDuration({ statistic: 'p50', label: 'p50' }),
fn.metricDuration({ statistic: 'p95', label: 'p95' }),
fn.metricDuration({ statistic: 'p99', label: 'p99' }),
],
})
);
// Row 2: Errors
dashboard.addWidgets(
new cloudwatch.LogQueryWidget({
title: 'Recent Errors',
logGroupNames: [fn.logGroup.logGroupName],
queryLines: [
'fields @timestamp, @message',
'filter level = "ERROR"',
'sort @timestamp desc',
'limit 20',
],
})
);Monitoring Serverless Architectures
End-to-End Monitoring
Monitor the entire flow:
API Gateway → Lambda → DynamoDB → EventBridge → Lambda
↓ ↓ ↓ ↓ ↓
Metrics Traces Metrics Metrics LogsKey metrics per service:
| Service | Key Metrics |
|---|---|
| API Gateway | Count, 4XXError, 5XXError, Latency, CacheHitCount |
| Lambda | Invocations, Errors, Duration, Throttles, ConcurrentExecutions |
| DynamoDB | ConsumedReadCapacity, ConsumedWriteCapacity, UserErrors, SystemErrors |
| SQS | NumberOfMessagesSent, NumberOfMessagesReceived, ApproximateAgeOfOldestMessage |
| EventBridge | Invocations, FailedInvocations, TriggeredRules |
| Step Functions | ExecutionsStarted, ExecutionsFailed, ExecutionTime |
Synthetic Monitoring
Use CloudWatch Synthetics for API monitoring:
import { Canary, Test, Code, Schedule } from '@aws-cdk/aws-synthetics-alpha';
new Canary(this, 'ApiCanary', {
canaryName: 'api-health-check',
schedule: Schedule.rate(Duration.minutes(5)),
test: Test.custom({
code: Code.fromInline(`
const synthetics = require('Synthetics');
const apiCanaryBlueprint = async function () {
const response = await synthetics.executeHttpStep('Verify API', {
url: 'https://api.example.com/health',
method: 'GET',
});
return response.statusCode === 200 ? 'success' : 'failure';
};
exports.handler = async () => {
return await apiCanaryBlueprint();
};
`),
handler: 'index.handler',
}),
runtime: synthetics.Runtime.SYNTHETICS_NODEJS_PUPPETEER_6_2,
});OpenTelemetry Integration
Amazon Distro for OpenTelemetry (ADOT)
Use ADOT for vendor-neutral observability:
// Lambda Layer with ADOT
const adotLayer = lambda.LayerVersion.fromLayerVersionArn(
this,
'AdotLayer',
`arn:aws:lambda:${this.region}:901920570463:layer:aws-otel-nodejs-amd64-ver-1-18-1:4`
);
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
layers: [adotLayer],
tracing: lambda.Tracing.ACTIVE,
environment: {
AWS_LAMBDA_EXEC_WRAPPER: '/opt/otel-handler',
OPENTELEMETRY_COLLECTOR_CONFIG_FILE: '/var/task/collector.yaml',
},
});Benefits of ADOT:
- Vendor-neutral (works with Datadog, New Relic, Honeycomb, etc.)
- Automatic instrumentation
- Consistent format across services
- Export to multiple backends
Best Practices Summary
Metrics
- ✅ Use CloudWatch Embedded Metric Format (EMF)
- ✅ Track business metrics, not just technical metrics
- ✅ Set alarms on error rate, latency, and throughput
- ✅ Use p99 for latency, not average
- ✅ Create dashboards for key services
Logging
- ✅ Use structured JSON logging
- ✅ Include correlation IDs (request ID, trace ID)
- ✅ Use appropriate log levels
- ✅ Never log sensitive data (PII, secrets)
- ✅ Use CloudWatch Logs Insights for analysis
Tracing
- ✅ Enable X-Ray tracing on all services
- ✅ Instrument AWS SDK calls
- ✅ Add custom annotations for business context
- ✅ Use service map to understand dependencies
- ✅ Correlate traces with logs and metrics
Alerting
- ✅ Alert on customer-impacting issues
- ✅ Tune thresholds to reduce false positives
- ✅ Use composite alarms to reduce noise
- ✅ Include clear remediation steps
- ✅ Escalate critical alarms appropriately
Tools
- ✅ Use Lambda Powertools for unified observability
- ✅ Use CloudWatch ServiceLens for service view
- ✅ Use Synthetics for proactive monitoring
- ✅ Consider ADOT for vendor-neutral observability
Serverless Performance Optimization
Performance optimization best practices for AWS Lambda and serverless architectures.
Table of Contents
- Lambda Execution Lifecycle
- Cold Start Optimization
- Memory and CPU Optimization
- Package Size Optimization
- Initialization Optimization
- Runtime Performance
Lambda Execution Lifecycle
Execution Environment Phases
Three phases of Lambda execution:
Init Phase (Cold Start):
- Download and unpack function package
- Create execution environment
- Initialize runtime
- Execute initialization code (outside handler)
Invoke Phase:
- Execute handler code
- Return response
- Freeze execution environment
Shutdown Phase:
- Runtime shutdown (after period of inactivity)
- Execution environment destroyed
Concurrency and Scaling
Key concepts:
- Concurrency: Number of execution environments serving requests simultaneously
- One event per environment: Each environment processes one event at a time
- Automatic scaling: Lambda creates new environments as needed
- Environment reuse: Warm starts reuse existing environments
Example:
- Function takes 100ms to execute
- Single environment can handle 10 requests/second
- 100 concurrent requests = 10 environments needed
- Default account limit: 1,000 concurrent executions (can be raised)
Cold Start Optimization
Understanding Cold Starts
Cold start components:
Total Cold Start = Download Package + Init Environment + Init Code + HandlerCold start frequency:
- Development: Every code change creates new environments (frequent)
- Production: Typically < 1% of invocations
- Optimize for p95/p99 latency, not average
Package Size Optimization
Minimize deployment package:
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
bundling: {
minify: true, // Minify production code
sourceMap: false, // Disable in production
externalModules: [
'@aws-sdk/*', // Use AWS SDK from runtime
],
// Tree-shaking removes unused code
},
});Tools for optimization:
- esbuild: Automatic tree-shaking and minification
- Webpack: Bundle optimization
- Maven: Dependency analysis
- Gradle: Unused dependency detection
Best practices:
- Avoid monolithic functions
- Bundle only required dependencies
- Use tree-shaking to remove unused code
- Minify production code
- Exclude AWS SDK (provided by runtime)
Provisioned Concurrency
Pre-initialize environments for predictable latency:
const fn = new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
});
// Static provisioned concurrency
fn.currentVersion.addAlias('live', {
provisionedConcurrentExecutions: 10,
});
// Auto-scaling provisioned concurrency
const alias = fn.currentVersion.addAlias('prod');
const target = new applicationautoscaling.ScalableTarget(this, 'ScalableTarget', {
serviceNamespace: applicationautoscaling.ServiceNamespace.LAMBDA,
maxCapacity: 100,
minCapacity: 10,
resourceId: `function:${fn.functionName}:${alias.aliasName}`,
scalableDimension: 'lambda:function:ProvisionedConcurrentExecutions',
});
target.scaleOnUtilization({
utilizationTarget: 0.7, // 70% utilization
});When to use:
- Consistent traffic patterns: Predictable load
- Latency-sensitive APIs: Sub-100ms requirements
- Cost consideration: Compare cold start frequency vs. provisioned cost
Cost comparison:
- On-demand: Pay only for actual usage
- Provisioned: Pay for provisioned capacity + invocations
- Breakeven: When cold starts > ~20% of invocations
Lambda SnapStart (Java)
Instant cold starts for Java:
new lambda.Function(this, 'JavaFunction', {
runtime: lambda.Runtime.JAVA_17,
code: lambda.Code.fromAsset('target/function.jar'),
handler: 'com.example.Handler::handleRequest',
snapStart: lambda.SnapStartConf.ON_PUBLISHED_VERSIONS,
});Benefits:
- Up to 10x faster cold starts for Java
- No code changes required
- Works with published versions
- No additional cost
Memory and CPU Optimization
Memory = CPU Allocation
Key principle: Memory and CPU are proportionally allocated
| Memory | vCPU |
|---|---|
| 128 MB | 0.07 vCPU |
| 512 MB | 0.28 vCPU |
| 1,024 MB | 0.57 vCPU |
| 1,769 MB | 1.00 vCPU |
| 3,538 MB | 2.00 vCPU |
| 10,240 MB | 6.00 vCPU |
Cost vs. Performance Balancing
Example - Compute-intensive function:
| Memory | Duration | Cost |
|---|---|---|
| 128 MB | 11.72s | $0.0246 |
| 256 MB | 6.68s | $0.0280 |
| 512 MB | 3.19s | $0.0268 |
| 1024 MB | 1.46s | $0.0246 |
Key insight: More memory = faster execution = similar or lower cost
Formula:
Duration = Allocated Memory (GB) × Execution Time (seconds)
Cost = Duration × Number of Invocations × Price per GB-secondFinding Optimal Memory
Use Lambda Power Tuning:
# Deploy power tuning state machine
sam deploy --template-file template.yml --stack-name lambda-power-tuning
# Run power tuning
aws lambda invoke \
--function-name powerTuningFunction \
--payload '{"lambdaARN": "arn:aws:lambda:...", "powerValues": [128, 256, 512, 1024, 1536, 3008]}' \
response.jsonManual testing approach:
- Test function at different memory levels
- Measure execution time at each level
- Calculate cost for each configuration
- Choose optimal balance for your use case
Multi-Core Optimization
Leverage multiple vCPUs (at 1,769 MB+):
// Use Worker Threads for parallel processing
import { Worker } from 'worker_threads';
export const handler = async (event: any) => {
const items = event.items;
// Process in parallel using multiple cores
const workers = items.map(item =>
new Promise((resolve, reject) => {
const worker = new Worker('./worker.js', {
workerData: item,
});
worker.on('message', resolve);
worker.on('error', reject);
})
);
const results = await Promise.all(workers);
return results;
};Python multiprocessing:
import multiprocessing as mp
def handler(event, context):
items = event['items']
# Use multiple cores for CPU-bound work
with mp.Pool(mp.cpu_count()) as pool:
results = pool.map(process_item, items)
return {'results': results}Initialization Optimization
Code Outside Handler
Initialize once, reuse across invocations:
// ✅ GOOD - Initialize outside handler
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
import { S3Client } from '@aws-sdk/client-s3';
// Initialized once per execution environment
const dynamodb = new DynamoDBClient({});
const s3 = new S3Client({});
// Connection pool initialized once
const pool = createConnectionPool({
host: process.env.DB_HOST,
max: 1, // One connection per execution environment
});
export const handler = async (event: any) => {
// Reuse connections across invocations
const data = await dynamodb.getItem({ /* ... */ });
const file = await s3.getObject({ /* ... */ });
return processData(data, file);
};
// ❌ BAD - Initialize in handler
export const handler = async (event: any) => {
const dynamodb = new DynamoDBClient({}); // Created every invocation
const s3 = new S3Client({}); // Created every invocation
// ...
};Lazy Loading
Load dependencies only when needed:
// ✅ GOOD - Conditional loading
export const handler = async (event: any) => {
if (event.operation === 'generatePDF') {
// Load heavy PDF library only when needed
const pdfLib = await import('./pdf-generator');
return pdfLib.generatePDF(event.data);
}
if (event.operation === 'processImage') {
const sharp = await import('sharp');
return processImage(sharp, event.data);
}
// Default operation (no heavy dependencies)
return processDefault(event);
};
// ❌ BAD - Load everything upfront
import pdfLib from './pdf-generator'; // 50MB
import sharp from 'sharp'; // 20MB
// Even if not used!
export const handler = async (event: any) => {
if (event.operation === 'generatePDF') {
return pdfLib.generatePDF(event.data);
}
};Connection Reuse
Enable connection reuse:
import { DynamoDBClient } from '@aws-sdk/client-dynamodb';
const client = new DynamoDBClient({
// Enable keep-alive for connection reuse
requestHandler: {
connectionTimeout: 3000,
socketTimeout: 3000,
},
});
// For Node.js AWS SDK
process.env.AWS_NODEJS_CONNECTION_REUSE_ENABLED = '1';Runtime Performance
Choose the Right Runtime
Runtime comparison:
| Runtime | Cold Start | Execution Speed | Ecosystem | Best For |
|---|---|---|---|---|
| Node.js 20 | Fast | Fast | Excellent | APIs, I/O-bound |
| Python 3.12 | Fast | Medium | Excellent | Data processing |
| Java 17 + SnapStart | Fast (w/SnapStart) | Fast | Good | Enterprise apps |
| .NET 8 | Medium | Fast | Good | Enterprise apps |
| Go | Very Fast | Very Fast | Good | High performance |
| Rust | Very Fast | Very Fast | Growing | High performance |
Optimize Handler Code
Efficient code patterns:
// ✅ GOOD - Batch operations
const items = ['item1', 'item2', 'item3'];
// Single batch write
await dynamodb.batchWriteItem({
RequestItems: {
[tableName]: items.map(item => ({
PutRequest: { Item: item },
})),
},
});
// ❌ BAD - Multiple single operations
for (const item of items) {
await dynamodb.putItem({
TableName: tableName,
Item: item,
}); // Slow, multiple round trips
}Async Processing
Use async/await effectively:
// ✅ GOOD - Parallel async operations
const [userData, orderData, inventoryData] = await Promise.all([
getUserData(userId),
getOrderData(orderId),
getInventoryData(productId),
]);
// ❌ BAD - Sequential async operations
const userData = await getUserData(userId);
const orderData = await getOrderData(orderId); // Waits unnecessarily
const inventoryData = await getInventoryData(productId); // Waits unnecessarilyCaching Strategies
Cache frequently accessed data:
// In-memory cache (persists in warm environments)
const cache = new Map<string, any>();
export const handler = async (event: any) => {
const key = event.key;
// Check cache first
if (cache.has(key)) {
console.log('Cache hit');
return cache.get(key);
}
// Fetch from database
const data = await fetchFromDatabase(key);
// Store in cache
cache.set(key, data);
return data;
};ElastiCache for shared cache:
import Redis from 'ioredis';
// Initialize once
const redis = new Redis({
host: process.env.REDIS_HOST,
port: 6379,
lazyConnect: true,
enableOfflineQueue: false,
});
export const handler = async (event: any) => {
const key = `order:${event.orderId}`;
// Try cache
const cached = await redis.get(key);
if (cached) {
return JSON.parse(cached);
}
// Fetch and cache
const data = await fetchOrder(event.orderId);
await redis.setex(key, 300, JSON.stringify(data)); // 5 min TTL
return data;
};Performance Testing
Load Testing
Use Artillery for load testing:
# load-test.yml
config:
target: https://api.example.com
phases:
- duration: 60
arrivalRate: 10
rampTo: 100 # Ramp from 10 to 100 req/sec
scenarios:
- flow:
- post:
url: /orders
json:
orderId: "{{ $randomString() }}"
amount: "{{ $randomNumber(10, 1000) }}"artillery run load-test.ymlBenchmarking
Test different configurations:
// benchmark.ts
import { Lambda } from '@aws-sdk/client-lambda';
const lambda = new Lambda({});
const testConfigurations = [
{ memory: 128, name: 'Function-128' },
{ memory: 256, name: 'Function-256' },
{ memory: 512, name: 'Function-512' },
{ memory: 1024, name: 'Function-1024' },
];
for (const config of testConfigurations) {
const times: number[] = [];
// Warm up
for (let i = 0; i < 5; i++) {
await lambda.invoke({ FunctionName: config.name });
}
// Measure
for (let i = 0; i < 100; i++) {
const start = Date.now();
await lambda.invoke({ FunctionName: config.name });
times.push(Date.now() - start);
}
const p99 = times.sort()[99];
const avg = times.reduce((a, b) => a + b) / times.length;
console.log(`${config.memory}MB - Avg: ${avg}ms, p99: ${p99}ms`);
}Cost Optimization
Right-Sizing Memory
Balance cost and performance:
CPU-bound workloads:
- More memory = more CPU = faster execution
- Often results in lower cost overall
- Test at 1769MB (1 vCPU) and above
I/O-bound workloads:
- Less sensitive to memory allocation
- May not benefit from higher memory
- Test at lower memory levels (256-512MB)
Simple operations:
- Minimal CPU required
- Use minimum memory (128-256MB)
- Fast execution despite low resources
Billing Granularity
Lambda bills in 1ms increments:
- Precise billing (7ms execution = 7ms cost)
- Optimize even small improvements
- Consider trade-offs carefully
Cost calculation:
Cost = (Memory GB) × (Duration seconds) × (Invocations) × ($0.0000166667/GB-second)
+ (Invocations) × ($0.20/1M requests)Cost Reduction Strategies
- Optimize execution time: Faster = cheaper
- Right-size memory: Balance CPU needs with cost
- Reduce invocations: Batch processing, caching
- Use Graviton2: 20% better price/performance
- Reserved Concurrency: Only when needed
- Compression: Reduce data transfer costs
Advanced Optimization
Lambda Extensions
Use extensions for cross-cutting concerns:
// Lambda layer with extension
const extensionLayer = lambda.LayerVersion.fromLayerVersionArn(
this,
'Extension',
'arn:aws:lambda:us-east-1:123456789:layer:my-extension:1'
);
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
layers: [extensionLayer],
});Common extensions:
- Secrets caching
- Configuration caching
- Custom logging
- Security scanning
- Performance monitoring
Graviton2 Architecture
20% better price/performance:
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
architecture: lambda.Architecture.ARM_64, // Graviton2
});Considerations:
- Most runtimes support ARM64
- Test thoroughly before migrating
- Dependencies must support ARM64
- Native extensions may need recompilation
VPC Optimization
Hyperplane ENIs (automatic since 2019):
- No ENI per function
- Faster cold starts in VPC
- Scales instantly
// Modern VPC configuration (fast)
new NodejsFunction(this, 'VpcFunction', {
entry: 'src/handler.ts',
vpc,
vpcSubnets: { subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS },
// Fast scaling, no ENI limitations
});Performance Monitoring
Key Metrics
Monitor these metrics:
- Duration: p50, p95, p99, max
- Cold Start %: ColdStartDuration / TotalDuration
- Error Rate: Errors / Invocations
- Throttles: Indicates concurrency limit reached
- Iterator Age: For stream processing lag
Performance Dashboards
const dashboard = new cloudwatch.Dashboard(this, 'PerformanceDashboard');
dashboard.addWidgets(
new cloudwatch.GraphWidget({
title: 'Latency Distribution',
left: [
fn.metricDuration({ statistic: 'p50', label: 'p50' }),
fn.metricDuration({ statistic: 'p95', label: 'p95' }),
fn.metricDuration({ statistic: 'p99', label: 'p99' }),
fn.metricDuration({ statistic: 'Maximum', label: 'max' }),
],
}),
new cloudwatch.GraphWidget({
title: 'Memory Utilization',
left: [fn.metricDuration()],
right: [fn.metricErrors()],
})
);Summary
- Cold Starts: Optimize package size, use provisioned concurrency for critical paths
- Memory: More memory often = faster execution = lower cost
- Initialization: Initialize connections outside handler
- Lazy Loading: Load dependencies only when needed
- Connection Reuse: Enable for AWS SDK clients
- Testing: Test at different memory levels to find optimal configuration
- Monitoring: Track p99 latency, not average
- Graviton2: Consider ARM64 for better price/performance
- Batch Operations: Reduce round trips to services
- Caching: Cache frequently accessed data
Serverless Security Best Practices
Security best practices for serverless applications based on AWS Well-Architected Framework.
Table of Contents
- Shared Responsibility Model
- Identity and Access Management
- Function Security
- API Security
- Data Protection
- Network Security
Shared Responsibility Model
Serverless Shifts Responsibility to AWS
With serverless, AWS takes on more security responsibilities:
AWS Responsibilities:
- Compute infrastructure
- Execution environment
- Runtime language and patches
- Networking infrastructure
- Server software and OS
- Physical hardware and facilities
- Automatic security patches (like Log4Shell mitigation)
Customer Responsibilities:
- Function code and dependencies
- Resource configuration
- Identity and Access Management (IAM)
- Data encryption (at rest and in transit)
- Application-level security
- Secure coding practices
Benefits of Shifted Responsibility
- Automatic Patching: AWS applies security patches automatically (e.g., Log4Shell fixed within 3 days)
- Infrastructure Security: No OS patching, server hardening, or vulnerability scanning
- Operational Agility: Quick security response at scale
- Focus on Code: Spend time on business logic, not infrastructure security
Identity and Access Management
Least Privilege Principle
Always use least privilege IAM policies:
// ✅ GOOD - Specific grant
const table = new dynamodb.Table(this, 'Table', {});
const function = new lambda.Function(this, 'Function', {});
table.grantReadData(function); // Only read access
// ❌ BAD - Overly broad
function.addToRolePolicy(new iam.PolicyStatement({
actions: ['dynamodb:*'],
resources: ['*'],
}));Function Execution Role
Separate roles per function:
// ✅ GOOD - Each function has its own role
const readFunction = new NodejsFunction(this, 'ReadFunction', {
entry: 'src/read.ts',
// Gets its own execution role
});
const writeFunction = new NodejsFunction(this, 'WriteFunction', {
entry: 'src/write.ts',
// Gets its own execution role
});
table.grantReadData(readFunction);
table.grantReadWriteData(writeFunction);
// ❌ BAD - Shared role with excessive permissions
const sharedRole = new iam.Role(this, 'SharedRole', {
assumedBy: new iam.ServicePrincipal('lambda.amazonaws.com'),
managedPolicies: [
iam.ManagedPolicy.fromAwsManagedPolicyName('AdministratorAccess'), // Too broad!
],
});Resource-Based Policies
Control who can invoke functions:
// Allow API Gateway to invoke function
myFunction.grantInvoke(new iam.ServicePrincipal('apigateway.amazonaws.com'));
// Allow specific account
myFunction.addPermission('AllowAccountInvoke', {
principal: new iam.AccountPrincipal('123456789012'),
action: 'lambda:InvokeFunction',
});
// Conditional invoke (only from specific VPC endpoint)
myFunction.addPermission('AllowVPCInvoke', {
principal: new iam.ServicePrincipal('lambda.amazonaws.com'),
action: 'lambda:InvokeFunction',
sourceArn: vpcEndpoint.vpcEndpointId,
});IAM Policies Best Practices
- Use grant methods: Prefer
.grantXxx()over manual policies - Condition keys: Use IAM conditions for fine-grained control
- Resource ARNs: Always specify resource ARNs, avoid wildcards
- Session policies: Use for temporary elevated permissions
- Service Control Policies (SCPs): Enforce organization-wide guardrails
Function Security
Lambda Isolation Model
Each function runs in isolated sandbox:
- Built on Firecracker microVMs
- Dedicated execution environment per function
- No shared memory between functions
- Isolated file system and network namespace
- Strong workload isolation
Execution Environment Security:
- One concurrent invocation per environment
- Environment may be reused (warm starts)
/tmpstorage persists between invocations- Sensitive data in memory may persist
Secure Coding Practices
Handle sensitive data securely:
// ✅ GOOD - Clean up sensitive data
export const handler = async (event: any) => {
const apiKey = process.env.API_KEY;
try {
const result = await callApi(apiKey);
return result;
} finally {
// Clear sensitive data from memory
delete process.env.API_KEY;
}
};
// ✅ GOOD - Use Secrets Manager
import { SecretsManagerClient, GetSecretValueCommand } from '@aws-sdk/client-secrets-manager';
const secretsClient = new SecretsManagerClient({});
export const handler = async (event: any) => {
const secret = await secretsClient.send(
new GetSecretValueCommand({ SecretId: process.env.SECRET_ARN })
);
const apiKey = secret.SecretString;
// Use apiKey
};Dependency Management
Scan dependencies for vulnerabilities:
// package.json
{
"scripts": {
"audit": "npm audit",
"audit:fix": "npm audit fix"
},
"devDependencies": {
"snyk": "^1.0.0"
}
}Keep dependencies updated:
- Run
npm auditorpip-auditregularly - Use Dependabot or Snyk for automated scanning
- Update dependencies promptly when vulnerabilities found
- Use minimal dependency sets
Environment Variable Security
Never store secrets in environment variables:
// ❌ BAD - Secret in environment variable
new NodejsFunction(this, 'Function', {
environment: {
API_KEY: 'sk-1234567890abcdef', // Never do this!
},
});
// ✅ GOOD - Reference to secret
new NodejsFunction(this, 'Function', {
environment: {
SECRET_ARN: secret.secretArn,
},
});
secret.grantRead(myFunction);API Security
API Gateway Security
Authentication and Authorization:
// Cognito User Pool authorizer
const authorizer = new apigateway.CognitoUserPoolsAuthorizer(this, 'Authorizer', {
cognitoUserPools: [userPool],
});
api.root.addMethod('GET', integration, {
authorizer,
authorizationType: apigateway.AuthorizationType.COGNITO,
});
// Lambda authorizer for custom auth
const customAuthorizer = new apigateway.TokenAuthorizer(this, 'CustomAuth', {
handler: authorizerFunction,
resultsCacheTtl: Duration.minutes(5),
});
// IAM authorization for service-to-service
api.root.addMethod('POST', integration, {
authorizationType: apigateway.AuthorizationType.IAM,
});Request Validation
Validate requests at API Gateway:
const validator = new apigateway.RequestValidator(this, 'Validator', {
api,
validateRequestBody: true,
validateRequestParameters: true,
});
const model = api.addModel('Model', {
schema: {
type: apigateway.JsonSchemaType.OBJECT,
required: ['email', 'name'],
properties: {
email: {
type: apigateway.JsonSchemaType.STRING,
format: 'email',
},
name: {
type: apigateway.JsonSchemaType.STRING,
minLength: 1,
maxLength: 100,
},
},
},
});
resource.addMethod('POST', integration, {
requestValidator: validator,
requestModels: {
'application/json': model,
},
});Rate Limiting and Throttling
const api = new apigateway.RestApi(this, 'Api', {
deployOptions: {
throttlingRateLimit: 1000, // requests per second
throttlingBurstLimit: 2000, // burst capacity
},
});
// Per-method throttling
resource.addMethod('POST', integration, {
methodResponses: [{ statusCode: '200' }],
requestParameters: {
'method.request.header.Authorization': true,
},
throttling: {
rateLimit: 100,
burstLimit: 200,
},
});API Keys and Usage Plans
const apiKey = api.addApiKey('ApiKey', {
apiKeyName: 'customer-key',
});
const plan = api.addUsagePlan('UsagePlan', {
name: 'Standard',
throttle: {
rateLimit: 100,
burstLimit: 200,
},
quota: {
limit: 10000,
period: apigateway.Period.MONTH,
},
});
plan.addApiKey(apiKey);
plan.addApiStage({
stage: api.deploymentStage,
});Data Protection
Encryption at Rest
DynamoDB encryption:
// Default: AWS-owned CMK (no additional cost)
const table = new dynamodb.Table(this, 'Table', {
encryption: dynamodb.TableEncryption.AWS_MANAGED, // AWS managed CMK
});
// Customer-managed CMK (for compliance)
const kmsKey = new kms.Key(this, 'Key', {
enableKeyRotation: true,
});
const table = new dynamodb.Table(this, 'Table', {
encryption: dynamodb.TableEncryption.CUSTOMER_MANAGED,
encryptionKey: kmsKey,
});S3 encryption:
// SSE-S3 (default, no additional cost)
const bucket = new s3.Bucket(this, 'Bucket', {
encryption: s3.BucketEncryption.S3_MANAGED,
});
// SSE-KMS (for fine-grained access control)
const bucket = new s3.Bucket(this, 'Bucket', {
encryption: s3.BucketEncryption.KMS,
encryptionKey: kmsKey,
});SQS/SNS encryption:
const queue = new sqs.Queue(this, 'Queue', {
encryption: sqs.QueueEncryption.KMS,
encryptionMasterKey: kmsKey,
});
const topic = new sns.Topic(this, 'Topic', {
masterKey: kmsKey,
});Encryption in Transit
All AWS service APIs use TLS:
- API Gateway endpoints use HTTPS by default
- Lambda to AWS service communication encrypted
- EventBridge, SQS, SNS use TLS
- Custom domains can use ACM certificates
// API Gateway with custom domain
const certificate = new acm.Certificate(this, 'Certificate', {
domainName: 'api.example.com',
validation: acm.CertificateValidation.fromDns(hostedZone),
});
const api = new apigateway.RestApi(this, 'Api', {
domainName: {
domainName: 'api.example.com',
certificate,
},
});Data Sanitization
Validate and sanitize inputs:
import DOMPurify from 'isomorphic-dompurify';
import { z } from 'zod';
// Schema validation
const OrderSchema = z.object({
orderId: z.string().uuid(),
amount: z.number().positive(),
email: z.string().email(),
});
export const handler = async (event: any) => {
const body = JSON.parse(event.body);
// Validate schema
const result = OrderSchema.safeParse(body);
if (!result.success) {
return {
statusCode: 400,
body: JSON.stringify({ error: result.error }),
};
}
// Sanitize HTML inputs
const sanitized = {
...result.data,
description: DOMPurify.sanitize(result.data.description),
};
await processOrder(sanitized);
};Network Security
VPC Configuration
Lambda in VPC for private resources:
const vpc = new ec2.Vpc(this, 'Vpc', {
maxAzs: 2,
natGateways: 1,
});
// Lambda in private subnet
const vpcFunction = new NodejsFunction(this, 'VpcFunction', {
entry: 'src/handler.ts',
vpc,
vpcSubnets: {
subnetType: ec2.SubnetType.PRIVATE_WITH_EGRESS,
},
securityGroups: [securityGroup],
});
// Security group for Lambda
const securityGroup = new ec2.SecurityGroup(this, 'LambdaSG', {
vpc,
description: 'Security group for Lambda function',
allowAllOutbound: false, // Restrict outbound
});
// Only allow access to RDS
securityGroup.addEgressRule(
ec2.Peer.securityGroupId(rdsSecurityGroup.securityGroupId),
ec2.Port.tcp(3306),
'Allow MySQL access'
);VPC Endpoints
Use VPC endpoints for AWS services:
// S3 VPC endpoint (gateway endpoint, no cost)
vpc.addGatewayEndpoint('S3Endpoint', {
service: ec2.GatewayVpcEndpointAwsService.S3,
});
// DynamoDB VPC endpoint (gateway endpoint, no cost)
vpc.addGatewayEndpoint('DynamoDBEndpoint', {
service: ec2.GatewayVpcEndpointAwsService.DYNAMODB,
});
// Secrets Manager VPC endpoint (interface endpoint, cost applies)
vpc.addInterfaceEndpoint('SecretsManagerEndpoint', {
service: ec2.InterfaceVpcEndpointAwsService.SECRETS_MANAGER,
privateDnsEnabled: true,
});Security Groups
Principle of least privilege for network access:
// Lambda security group
const lambdaSG = new ec2.SecurityGroup(this, 'LambdaSG', {
vpc,
allowAllOutbound: false,
});
// RDS security group
const rdsSG = new ec2.SecurityGroup(this, 'RDSSG', {
vpc,
allowAllOutbound: false,
});
// Allow Lambda to access RDS only
rdsSG.addIngressRule(
ec2.Peer.securityGroupId(lambdaSG.securityGroupId),
ec2.Port.tcp(3306),
'Allow Lambda access'
);
lambdaSG.addEgressRule(
ec2.Peer.securityGroupId(rdsSG.securityGroupId),
ec2.Port.tcp(3306),
'Allow RDS access'
);Security Monitoring
CloudWatch Logs
Enable and encrypt logs:
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
logRetention: logs.RetentionDays.ONE_WEEK,
logGroup: new logs.LogGroup(this, 'LogGroup', {
encryptionKey: kmsKey, // Encrypt logs
retention: logs.RetentionDays.ONE_WEEK,
}),
});CloudTrail
Enable CloudTrail for audit:
const trail = new cloudtrail.Trail(this, 'Trail', {
isMultiRegionTrail: true,
includeGlobalServiceEvents: true,
managementEvents: cloudtrail.ReadWriteType.ALL,
});
// Log Lambda invocations
trail.addLambdaEventSelector([{
includeManagementEvents: true,
readWriteType: cloudtrail.ReadWriteType.ALL,
}]);GuardDuty
Enable GuardDuty for threat detection:
- Analyzes VPC Flow Logs, DNS logs, CloudTrail events
- Detects unusual API activity
- Identifies compromised credentials
- Monitors for cryptocurrency mining
Security Best Practices Checklist
Development
- Validate and sanitize all inputs
- Scan dependencies for vulnerabilities
- Use least privilege IAM permissions
- Store secrets in Secrets Manager or Parameter Store
- Never log sensitive data
- Enable encryption for all data stores
- Use environment variables for configuration, not secrets
Deployment
- Enable CloudTrail in all regions
- Configure VPC for sensitive workloads
- Use VPC endpoints for AWS service access
- Enable GuardDuty for threat detection
- Implement resource-based policies
- Use AWS WAF for API protection
- Enable access logging for API Gateway
Operations
- Monitor CloudTrail for unusual activity
- Set up alarms for security events
- Rotate secrets regularly
- Review IAM policies periodically
- Audit function permissions
- Monitor GuardDuty findings
- Implement automated security responses
Testing
- Test with least privilege policies
- Validate error handling for security failures
- Test input validation and sanitization
- Verify encryption configurations
- Test with malicious payloads
- Audit logs for security events
Summary
- Shared Responsibility: AWS handles infrastructure, you handle application security
- Least Privilege: Use IAM grant methods, avoid wildcards
- Encryption: Enable encryption at rest and in transit
- Input Validation: Validate and sanitize all inputs
- Dependency Security: Scan and update dependencies regularly
- Monitoring: Enable CloudTrail, GuardDuty, and CloudWatch
- Secrets Management: Use Secrets Manager, never environment variables
- Network Security: Use VPC, security groups, and VPC endpoints appropriately
Serverless Architecture Patterns
Comprehensive patterns for building serverless applications on AWS based on Well-Architected Framework principles.
Table of Contents
- Core Serverless Patterns
- API Patterns
- Data Processing Patterns
- Integration Patterns
- Orchestration Patterns
- Anti-Patterns
Core Serverless Patterns
Pattern: Serverless Microservices
Use case: Independent, scalable services with separate databases
Architecture:
API Gateway → Lambda Functions → DynamoDB/RDS
↓ (events)
EventBridge → Other ServicesCDK Implementation:
// User Service
const userTable = new dynamodb.Table(this, 'Users', {
partitionKey: { name: 'userId', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});
const userFunction = new NodejsFunction(this, 'UserHandler', {
entry: 'src/services/users/handler.ts',
environment: {
TABLE_NAME: userTable.tableName,
},
});
userTable.grantReadWriteData(userFunction);
// Order Service (separate database)
const orderTable = new dynamodb.Table(this, 'Orders', {
partitionKey: { name: 'orderId', type: dynamodb.AttributeType.STRING },
billingMode: dynamodb.BillingMode.PAY_PER_REQUEST,
});
const orderFunction = new NodejsFunction(this, 'OrderHandler', {
entry: 'src/services/orders/handler.ts',
environment: {
TABLE_NAME: orderTable.tableName,
EVENT_BUS: eventBus.eventBusName,
},
});
orderTable.grantReadWriteData(orderFunction);
eventBus.grantPutEventsTo(orderFunction);Benefits:
- Independent deployment and scaling
- Database per service (data isolation)
- Technology diversity
- Fault isolation
Pattern: Serverless API Backend
Use case: REST or GraphQL API with serverless compute
REST API with API Gateway:
const api = new apigateway.RestApi(this, 'Api', {
restApiName: 'serverless-api',
deployOptions: {
stageName: 'prod',
tracingEnabled: true,
loggingLevel: apigateway.MethodLoggingLevel.INFO,
dataTraceEnabled: true,
metricsEnabled: true,
},
defaultCorsPreflightOptions: {
allowOrigins: apigateway.Cors.ALL_ORIGINS,
allowMethods: apigateway.Cors.ALL_METHODS,
},
});
// Resource-based routing
const items = api.root.addResource('items');
items.addMethod('GET', new apigateway.LambdaIntegration(listFunction));
items.addMethod('POST', new apigateway.LambdaIntegration(createFunction));
const item = items.addResource('{id}');
item.addMethod('GET', new apigateway.LambdaIntegration(getFunction));
item.addMethod('PUT', new apigateway.LambdaIntegration(updateFunction));
item.addMethod('DELETE', new apigateway.LambdaIntegration(deleteFunction));GraphQL API with AppSync:
const api = new appsync.GraphqlApi(this, 'Api', {
name: 'serverless-graphql-api',
schema: appsync.SchemaFile.fromAsset('schema.graphql'),
authorizationConfig: {
defaultAuthorization: {
authorizationType: appsync.AuthorizationType.API_KEY,
},
},
xrayEnabled: true,
});
// Lambda resolver
const dataSource = api.addLambdaDataSource('lambda-ds', resolverFunction);
dataSource.createResolver('QueryGetItem', {
typeName: 'Query',
fieldName: 'getItem',
});Pattern: Serverless Data Lake
Use case: Ingest, process, and analyze large-scale data
Architecture:
S3 (raw data) → Lambda (transform) → S3 (processed)
↓ (catalog)
AWS Glue → Athena (query)Implementation:
const rawBucket = new s3.Bucket(this, 'RawData');
const processedBucket = new s3.Bucket(this, 'ProcessedData');
// Trigger Lambda on file upload
rawBucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(transformFunction),
{ prefix: 'incoming/' }
);
// Transform function
export const transform = async (event: S3Event) => {
for (const record of event.Records) {
const key = record.s3.object.key;
// Get raw data
const raw = await s3.getObject({
Bucket: record.s3.bucket.name,
Key: key,
});
// Transform data
const transformed = await transformData(raw.Body);
// Write to processed bucket
await s3.putObject({
Bucket: process.env.PROCESSED_BUCKET,
Key: `processed/${key}`,
Body: JSON.stringify(transformed),
});
}
};API Patterns
Pattern: Authorizer Pattern
Use case: Custom authentication and authorization
// Lambda authorizer
const authorizer = new apigateway.TokenAuthorizer(this, 'Authorizer', {
handler: authorizerFunction,
identitySource: 'method.request.header.Authorization',
resultsCacheTtl: Duration.minutes(5),
});
// Apply to API methods
const resource = api.root.addResource('protected');
resource.addMethod('GET', new apigateway.LambdaIntegration(protectedFunction), {
authorizer,
});Pattern: Request Validation
Use case: Validate requests before Lambda invocation
const requestModel = api.addModel('RequestModel', {
contentType: 'application/json',
schema: {
type: apigateway.JsonSchemaType.OBJECT,
required: ['name', 'email'],
properties: {
name: { type: apigateway.JsonSchemaType.STRING, minLength: 1 },
email: { type: apigateway.JsonSchemaType.STRING, format: 'email' },
},
},
});
resource.addMethod('POST', integration, {
requestValidator: new apigateway.RequestValidator(this, 'Validator', {
api,
validateRequestBody: true,
validateRequestParameters: true,
}),
requestModels: {
'application/json': requestModel,
},
});Pattern: Response Caching
Use case: Reduce backend load and improve latency
const api = new apigateway.RestApi(this, 'Api', {
deployOptions: {
cachingEnabled: true,
cacheTtl: Duration.minutes(5),
cacheClusterEnabled: true,
cacheClusterSize: '0.5', // GB
},
});
// Enable caching per method
resource.addMethod('GET', integration, {
methodResponses: [{
statusCode: '200',
responseParameters: {
'method.response.header.Cache-Control': true,
},
}],
});Data Processing Patterns
Pattern: S3 Event Processing
Use case: Process files uploaded to S3
const bucket = new s3.Bucket(this, 'DataBucket');
// Process images
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(imageProcessingFunction),
{ suffix: '.jpg' }
);
// Process CSV files
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.LambdaDestination(csvProcessingFunction),
{ suffix: '.csv' }
);
// Large file processing with Step Functions
bucket.addEventNotification(
s3.EventType.OBJECT_CREATED,
new s3n.SfnDestination(processingStateMachine),
{ prefix: 'large-files/' }
);Pattern: DynamoDB Streams Processing
Use case: React to database changes
const table = new dynamodb.Table(this, 'Table', {
partitionKey: { name: 'id', type: dynamodb.AttributeType.STRING },
stream: dynamodb.StreamViewType.NEW_AND_OLD_IMAGES,
});
// Process stream changes
new lambda.EventSourceMapping(this, 'StreamConsumer', {
target: streamProcessorFunction,
eventSourceArn: table.tableStreamArn,
startingPosition: lambda.StartingPosition.LATEST,
batchSize: 100,
maxBatchingWindow: Duration.seconds(5),
bisectBatchOnError: true,
retryAttempts: 3,
});
// Example: Sync to search index
export const processStream = async (event: DynamoDBStreamEvent) => {
for (const record of event.Records) {
if (record.eventName === 'INSERT' || record.eventName === 'MODIFY') {
const newImage = record.dynamodb?.NewImage;
await elasticSearch.index({
index: 'items',
id: newImage?.id.S,
body: unmarshall(newImage),
});
} else if (record.eventName === 'REMOVE') {
await elasticSearch.delete({
index: 'items',
id: record.dynamodb?.Keys?.id.S,
});
}
}
};Pattern: Kinesis Stream Processing
Use case: Real-time data streaming and analytics
const stream = new kinesis.Stream(this, 'EventStream', {
shardCount: 2,
streamMode: kinesis.StreamMode.PROVISIONED,
});
// Fan-out with multiple consumers
const consumer1 = new lambda.EventSourceMapping(this, 'Analytics', {
target: analyticsFunction,
eventSourceArn: stream.streamArn,
startingPosition: lambda.StartingPosition.LATEST,
batchSize: 100,
parallelizationFactor: 10, // Process 10 batches per shard in parallel
});
const consumer2 = new lambda.EventSourceMapping(this, 'Alerting', {
target: alertingFunction,
eventSourceArn: stream.streamArn,
startingPosition: lambda.StartingPosition.LATEST,
filters: [
lambda.FilterCriteria.filter({
eventName: lambda.FilterRule.isEqual('CRITICAL_EVENT'),
}),
],
});Integration Patterns
Pattern: Service Integration with EventBridge
Use case: Decouple services with events
const eventBus = new events.EventBus(this, 'AppBus');
// Service A publishes events
const serviceA = new NodejsFunction(this, 'ServiceA', {
entry: 'src/services/a/handler.ts',
environment: {
EVENT_BUS: eventBus.eventBusName,
},
});
eventBus.grantPutEventsTo(serviceA);
// Service B subscribes to events
new events.Rule(this, 'ServiceBRule', {
eventBus,
eventPattern: {
source: ['service.a'],
detailType: ['EntityCreated'],
},
targets: [new targets.LambdaFunction(serviceBFunction)],
});
// Service C subscribes to same events
new events.Rule(this, 'ServiceCRule', {
eventBus,
eventPattern: {
source: ['service.a'],
detailType: ['EntityCreated'],
},
targets: [new targets.LambdaFunction(serviceCFunction)],
});Pattern: API Gateway + SQS Integration
Use case: Async API requests without Lambda
const queue = new sqs.Queue(this, 'RequestQueue');
const api = new apigateway.RestApi(this, 'Api');
// Direct SQS integration (no Lambda)
const sqsIntegration = new apigateway.AwsIntegration({
service: 'sqs',
path: `${process.env.AWS_ACCOUNT_ID}/${queue.queueName}`,
integrationHttpMethod: 'POST',
options: {
credentialsRole: sqsRole,
requestParameters: {
'integration.request.header.Content-Type': "'application/x-www-form-urlencoded'",
},
requestTemplates: {
'application/json': 'Action=SendMessage&MessageBody=$input.body',
},
integrationResponses: [{
statusCode: '200',
}],
},
});
api.root.addMethod('POST', sqsIntegration, {
methodResponses: [{ statusCode: '200' }],
});Pattern: EventBridge + Step Functions
Use case: Event-triggered workflow orchestration
// State machine for order processing
const orderStateMachine = new stepfunctions.StateMachine(this, 'OrderFlow', {
definition: /* ... */,
});
// EventBridge triggers state machine
new events.Rule(this, 'OrderPlacedRule', {
eventPattern: {
source: ['orders'],
detailType: ['OrderPlaced'],
},
targets: [new targets.SfnStateMachine(orderStateMachine)],
});Orchestration Patterns
Pattern: Sequential Workflow
Use case: Multi-step process with dependencies
const definition = new tasks.LambdaInvoke(this, 'Step1', {
lambdaFunction: step1Function,
outputPath: '$.Payload',
})
.next(new tasks.LambdaInvoke(this, 'Step2', {
lambdaFunction: step2Function,
outputPath: '$.Payload',
}))
.next(new tasks.LambdaInvoke(this, 'Step3', {
lambdaFunction: step3Function,
outputPath: '$.Payload',
}));
new stepfunctions.StateMachine(this, 'Sequential', {
definition,
});Pattern: Parallel Execution
Use case: Execute independent tasks concurrently
const parallel = new stepfunctions.Parallel(this, 'ParallelProcessing');
parallel.branch(new tasks.LambdaInvoke(this, 'ProcessA', {
lambdaFunction: functionA,
}));
parallel.branch(new tasks.LambdaInvoke(this, 'ProcessB', {
lambdaFunction: functionB,
}));
parallel.branch(new tasks.LambdaInvoke(this, 'ProcessC', {
lambdaFunction: functionC,
}));
const definition = parallel.next(new tasks.LambdaInvoke(this, 'Aggregate', {
lambdaFunction: aggregateFunction,
}));
new stepfunctions.StateMachine(this, 'Parallel', { definition });Pattern: Map State (Dynamic Parallelism)
Use case: Process array of items in parallel
const mapState = new stepfunctions.Map(this, 'ProcessItems', {
maxConcurrency: 10,
itemsPath: '$.items',
});
mapState.iterator(new tasks.LambdaInvoke(this, 'ProcessItem', {
lambdaFunction: processItemFunction,
}));
const definition = mapState.next(new tasks.LambdaInvoke(this, 'Finalize', {
lambdaFunction: finalizeFunction,
}));Pattern: Choice State (Conditional Logic)
Use case: Branching logic based on input
const choice = new stepfunctions.Choice(this, 'OrderType');
choice.when(
stepfunctions.Condition.stringEquals('$.orderType', 'STANDARD'),
standardProcessing
);
choice.when(
stepfunctions.Condition.stringEquals('$.orderType', 'EXPRESS'),
expressProcessing
);
choice.otherwise(defaultProcessing);Pattern: Wait State
Use case: Delay between steps or wait for callbacks
// Fixed delay
const wait = new stepfunctions.Wait(this, 'Wait30Seconds', {
time: stepfunctions.WaitTime.duration(Duration.seconds(30)),
});
// Wait until timestamp
const waitUntil = new stepfunctions.Wait(this, 'WaitUntil', {
time: stepfunctions.WaitTime.timestampPath('$.expiryTime'),
});
// Wait for callback (.waitForTaskToken)
const waitForCallback = new tasks.LambdaInvoke(this, 'WaitForApproval', {
lambdaFunction: approvalFunction,
integrationPattern: stepfunctions.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
payload: stepfunctions.TaskInput.fromObject({
token: stepfunctions.JsonPath.taskToken,
data: stepfunctions.JsonPath.entirePayload,
}),
});Anti-Patterns
❌ Lambda Monolith
Problem: Single Lambda handling all operations
// BAD
export const handler = async (event: any) => {
switch (event.operation) {
case 'createUser': return createUser(event);
case 'getUser': return getUser(event);
case 'updateUser': return updateUser(event);
case 'deleteUser': return deleteUser(event);
case 'createOrder': return createOrder(event);
// ... 20 more operations
}
};Solution: Separate Lambda functions per operation
// GOOD - Separate functions
export const createUser = async (event: any) => { /* ... */ };
export const getUser = async (event: any) => { /* ... */ };
export const updateUser = async (event: any) => { /* ... */ };❌ Recursive Lambda Pattern
Problem: Lambda invoking itself (runaway costs)
// BAD
export const handler = async (event: any) => {
await processItem(event);
if (hasMoreItems()) {
await lambda.invoke({
FunctionName: process.env.AWS_LAMBDA_FUNCTION_NAME,
InvocationType: 'Event',
Payload: JSON.stringify({ /* next batch */ }),
});
}
};Solution: Use SQS or Step Functions
// GOOD - Use SQS for iteration
export const handler = async (event: SQSEvent) => {
for (const record of event.Records) {
await processItem(record);
}
// SQS handles iteration automatically
};❌ Lambda Chaining
Problem: Lambda directly invoking another Lambda
// BAD
export const handler1 = async (event: any) => {
const result = await processStep1(event);
// Directly invoking next Lambda
await lambda.invoke({
FunctionName: 'handler2',
Payload: JSON.stringify(result),
});
};Solution: Use EventBridge, SQS, or Step Functions
// GOOD - Publish to EventBridge
export const handler1 = async (event: any) => {
const result = await processStep1(event);
await eventBridge.putEvents({
Entries: [{
Source: 'service.step1',
DetailType: 'Step1Completed',
Detail: JSON.stringify(result),
}],
});
};❌ Synchronous Waiting in Lambda
Problem: Lambda waiting for slow operations
// BAD - Blocking on slow operation
export const handler = async (event: any) => {
await startBatchJob(); // Returns immediately
// Wait for job to complete (wastes Lambda time)
while (true) {
const status = await checkJobStatus();
if (status === 'COMPLETE') break;
await sleep(1000);
}
};Solution: Use Step Functions with callback pattern
// GOOD - Step Functions waits, not Lambda
const waitForJob = new tasks.LambdaInvoke(this, 'StartJob', {
lambdaFunction: startJobFunction,
integrationPattern: stepfunctions.IntegrationPattern.WAIT_FOR_TASK_TOKEN,
payload: stepfunctions.TaskInput.fromObject({
token: stepfunctions.JsonPath.taskToken,
}),
});❌ Large Deployment Packages
Problem: Large Lambda packages increase cold start time
Solution:
- Use layers for shared dependencies
- Externalize AWS SDK
- Minimize bundle size
new NodejsFunction(this, 'Function', {
entry: 'src/handler.ts',
bundling: {
minify: true,
externalModules: ['@aws-sdk/*'], // Provided by runtime
nodeModules: ['only-needed-deps'], // Selective bundling
},
});Performance Optimization
Cold Start Optimization
Techniques:
- Minimize package size
- Use provisioned concurrency for critical paths
- Lazy load dependencies
- Reuse connections outside handler
- Use Lambda SnapStart (Java)
// For latency-sensitive APIs
const apiFunction = new NodejsFunction(this, 'ApiFunction', {
entry: 'src/api.ts',
memorySize: 1769, // 1 vCPU for faster initialization
});
const alias = apiFunction.currentVersion.addAlias('live');
alias.addAutoScaling({
minCapacity: 2,
maxCapacity: 10,
}).scaleOnUtilization({
utilizationTarget: 0.7,
});Right-Sizing Memory
Test different memory configurations:
// CPU-bound workload
new NodejsFunction(this, 'ComputeFunction', {
memorySize: 1769, // 1 vCPU
timeout: Duration.seconds(30),
});
// I/O-bound workload
new NodejsFunction(this, 'IOFunction', {
memorySize: 512, // Less CPU needed
timeout: Duration.seconds(60),
});
// Simple operations
new NodejsFunction(this, 'SimpleFunction', {
memorySize: 256,
timeout: Duration.seconds(10),
});Concurrent Execution Control
// Protect downstream services
new NodejsFunction(this, 'Function', {
reservedConcurrentExecutions: 10, // Max 10 concurrent
});
// Unreserved concurrency (shared pool)
new NodejsFunction(this, 'Function', {
// Uses unreserved account concurrency
});Testing Strategies
Unit Testing
Test business logic separate from AWS services:
// handler.ts
export const processOrder = async (order: Order): Promise<Result> => {
// Business logic (easily testable)
const validated = validateOrder(order);
const priced = calculatePrice(validated);
return transformResult(priced);
};
export const handler = async (event: any): Promise<any> => {
const order = parseEvent(event);
const result = await processOrder(order);
await saveToDatabase(result);
return formatResponse(result);
};
// handler.test.ts
test('processOrder calculates price correctly', () => {
const order = { items: [{ price: 10, quantity: 2 }] };
const result = processOrder(order);
expect(result.total).toBe(20);
});Integration Testing
Test with actual AWS services:
// integration.test.ts
import { LambdaClient, InvokeCommand } from '@aws-sdk/client-lambda';
test('Lambda processes order correctly', async () => {
const lambda = new LambdaClient({});
const response = await lambda.send(new InvokeCommand({
FunctionName: process.env.FUNCTION_NAME,
Payload: JSON.stringify({ orderId: '123' }),
}));
const result = JSON.parse(Buffer.from(response.Payload!).toString());
expect(result.statusCode).toBe(200);
});Local Testing with SAM
# Test API locally
sam local start-api
# Invoke function locally
sam local invoke MyFunction -e events/test-event.json
# Generate sample event
sam local generate-event apigateway aws-proxy > event.jsonSummary
- Single Purpose: One function, one responsibility
- Concurrent Design: Think concurrency, not volume
- Stateless: Use external storage for state
- State Machines: Orchestrate with Step Functions
- Event-Driven: Use events over direct calls
- Idempotent: Handle failures and duplicates gracefully
- Observability: Enable tracing and structured logging
Install this Skill
Skills give your AI agent a consistent, structured approach to this task — better output than a one-off prompt.
npx skills add zxkane/aws-skills --skill plugins/serverless-eda Community skill by @zxkane. Need a walkthrough? See the install guide →
Details
- Category
- Development
- License
- MIT
- Author
- @zxkane
- Source
- GitHub →
- Source file
-
show path
plugins/serverless-eda/skills/aws-serverless-eda/SKILL.md
People who install this also use
AWS CDK Development
AWS CDK expert skill for building cloud infrastructure with TypeScript or Python using best-practice CDK patterns.
@zxkane
AWS Cost Operations
AWS cost optimization and operations skill for pricing analysis, CloudWatch monitoring, budget review, and operational excellence.
@zxkane