Cofee, tech and community

[
[
[

]
]
]

Building Production AI Agents with Strands SDK and Amazon Bedrock AgentCore

AI agents are the next evolution of generative AI—systems that don’t just respond to prompts but can reason, plan, and take actions autonomously. With the release of Strands Agents SDK and Amazon Bedrock AgentCore, building and deploying production-ready AI agents has become significantly easier.

In this guide, I’ll walk you through building a complete AI agent and deploying it to AWS, from development to production.

What is Strands?

Strands is an open-source SDK from AWS for building AI agents. Unlike simple chatbots, Strands agents can:

  • Use tools – Call APIs, query databases, execute code
  • Maintain context – Remember conversation history and state
  • Plan and reason – Break complex tasks into steps
  • Stream responses – Provide real-time feedback

Key differentiators:

  • Native AWS integration (Bedrock, Lambda, S3, etc.)
  • Built-in observability with OpenTelemetry
  • Production-ready deployment to Bedrock AgentCore
  • Support for multiple LLM providers

Architecture Overview

Layer Component Description
Application Your Application Your code that uses the agent
SDK Strands Agent SDK Tools (Actions) | Memory (Context) | Planning (Reasoning)
Runtime Amazon Bedrock AgentCore Runtime (Fargate) | Scaling (Auto) | Monitoring (CloudWatch)
Models Amazon Bedrock Claude 3.5 Sonnet / Titan / Llama / Mistral

Getting Started

Prerequisites

  • Python 3.10+
  • AWS account with Bedrock access
  • AWS CLI configured

Installation

pip install strands-agents bedrock-agentcore

Building Your First Agent

Step 1: Basic Agent

from strands import Agent
# Create a simple agent
agent = Agent(
system_prompt="You are a helpful assistant that specializes in AWS cloud architecture."
)
# Run the agent
response = agent("What's the best way to set up a VPC for a microservices application?")
print(response.message)

Step 2: Adding Tools

Tools give your agent the ability to take actions. Let’s add tools for AWS cost analysis:

from strands import Agent, tool
import boto3
@tool
def get_ec2_instances(region: str = "us-east-1") -> dict:
"""Get a list of all EC2 instances in the specified region.
Args:
region: AWS region to query (default: us-east-1)
Returns:
Dictionary containing instance information
"""
ec2 = boto3.client('ec2', region_name=region)
response = ec2.describe_instances()
instances = []
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instances.append({
'id': instance['InstanceId'],
'type': instance['InstanceType'],
'state': instance['State']['Name'],
'launch_time': str(instance.get('LaunchTime', 'N/A'))
})
return {'instances': instances, 'count': len(instances)}
@tool
def get_monthly_cost(service: str = None) -> dict:
"""Get AWS cost for the current month.
Args:
service: Optional AWS service name to filter (e.g., 'Amazon EC2')
Returns:
Dictionary containing cost information
"""
ce = boto3.client('ce', region_name='us-east-1')
from datetime import datetime, timedelta
end = datetime.now()
start = end.replace(day=1)
params = {
'TimePeriod': {
'Start': start.strftime('%Y-%m-%d'),
'End': end.strftime('%Y-%m-%d')
},
'Granularity': 'MONTHLY',
'Metrics': ['UnblendedCost']
}
if service:
params['Filter'] = {
'Dimensions': {
'Key': 'SERVICE',
'Values': [service]
}
}
response = ce.get_cost_and_usage(**params)
total = 0
for result in response['ResultsByTime']:
total += float(result['Total']['UnblendedCost']['Amount'])
return {
'total_cost': round(total, 2),
'currency': 'USD',
'period': f"{start.strftime('%Y-%m-%d')} to {end.strftime('%Y-%m-%d')}"
}
# Create agent with tools
agent = Agent(
system_prompt="""You are an AWS cost optimization expert.
Use the available tools to analyze infrastructure and provide recommendations.
Always provide specific, actionable advice.""",
tools=[get_ec2_instances, get_monthly_cost]
)
# The agent can now query AWS
response = agent("How much am I spending this month and what EC2 instances are running?")
print(response.message)

Step 3: Streaming Responses

For better UX, stream responses in real-time:

async def chat_with_agent():
agent = Agent(
system_prompt="You are a helpful AWS architect.",
tools=[get_ec2_instances, get_monthly_cost]
)
async for event in agent.stream_async("Analyze my AWS costs"):
if hasattr(event, 'content'):
print(event.content, end='', flush=True)
elif hasattr(event, 'tool_use'):
print(f"\n[Using tool: {event.tool_use.name}]")
# Run
import asyncio
asyncio.run(chat_with_agent())

Deploying to Bedrock AgentCore

Option 1: Quick Deploy with Starter Toolkit

The fastest way to get to production:

# Install the toolkit
pip install bedrock-agentcore-starter-toolkit
# Configure your agent
agentcore configure –entrypoint my_agent.py
# Test locally (requires Docker)
agentcore launch –local
# Deploy to AWS
agentcore launch
# Test the deployed agent
agentcore invoke '{"prompt": "Analyze my AWS costs"}'

Your agent file (my_agent.py):

from bedrock_agentcore.runtime import BedrockAgentCoreApp
from strands import Agent, tool
import boto3
app = BedrockAgentCoreApp()
@tool
def get_ec2_instances(region: str = "us-east-1") -> dict:
"""Get EC2 instances in the specified region."""
ec2 = boto3.client('ec2', region_name=region)
response = ec2.describe_instances()
instances = []
for reservation in response['Reservations']:
for instance in reservation['Instances']:
instances.append({
'id': instance['InstanceId'],
'type': instance['InstanceType'],
'state': instance['State']['Name']
})
return {'instances': instances}
agent = Agent(
system_prompt="You are an AWS infrastructure expert.",
tools=[get_ec2_instances]
)
@app.entrypoint
def invoke(payload):
user_message = payload.get("prompt", "Hello")
result = agent(user_message)
return {"result": result.message}
@app.entrypoint
async def stream(payload):
user_message = payload.get("prompt", "Hello")
async for event in agent.stream_async(user_message):
yield event
if __name__ == "__main__":
app.run()
view raw my_agent.py hosted with ❤ by GitHub

Option 2: Custom Deployment with Docker

For more control, build your own container:

agent.py:

from fastapi import FastAPI, HTTPException
from fastapi.responses import StreamingResponse
from pydantic import BaseModel
from strands import Agent, tool
import boto3
import json
app = FastAPI(title="AWS Cost Agent", version="1.0.0")
# Define tools
@tool
def analyze_costs() -> dict:
"""Analyze current AWS spending."""
ce = boto3.client('ce', region_name='us-east-1')
# … implementation
return {"total": 1234.56, "currency": "USD"}
@tool
def get_recommendations() -> dict:
"""Get cost optimization recommendations."""
# … implementation
return {"recommendations": […]}
# Initialize agent
agent = Agent(
system_prompt="You are an AWS cost optimization expert.",
tools=[analyze_costs, get_recommendations]
)
class InvocationRequest(BaseModel):
input: dict
class InvocationResponse(BaseModel):
output: dict
@app.post("/invocations", response_model=InvocationResponse)
async def invoke_agent(request: InvocationRequest):
"""Synchronous agent invocation."""
try:
prompt = request.input.get("prompt", "")
if not prompt:
raise HTTPException(status_code=400, detail="No prompt provided")
result = agent(prompt)
return InvocationResponse(output={"message": result.message})
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
@app.post("/invocations/stream")
async def stream_agent(request: InvocationRequest):
"""Streaming agent invocation."""
async def generate():
prompt = request.input.get("prompt", "")
async for event in agent.stream_async(prompt):
if hasattr(event, 'content'):
yield f"data: {json.dumps({'content': event.content})}\n\n"
yield "data: [DONE]\n\n"
return StreamingResponse(
generate(),
media_type="text/event-stream"
)
@app.get("/ping")
async def ping():
"""Health check endpoint (required by AgentCore)."""
return {"status": "healthy"}
view raw agent.py hosted with ❤ by GitHub

Dockerfile:

FROM –platform=linux/arm64 ghcr.io/astral-sh/uv:python3.11-bookworm-slim
WORKDIR /app
# Install dependencies
COPY pyproject.toml uv.lock ./
RUN uv sync –frozen –no-cache
# Copy application
COPY agent.py ./
# AgentCore requires port 8080
EXPOSE 8080
# Run with OpenTelemetry for observability
CMD ["opentelemetry-instrument", "uv", "run", "uvicorn", "agent:app", "–host", "0.0.0.0", "–port", "8080"]
view raw Dockerfile hosted with ❤ by GitHub

pyproject.toml:

[project]
name = "aws-cost-agent"
version = "1.0.0"
requires-python = ">=3.11"
dependencies = [
"strands-agents>=0.1.0",
"fastapi>=0.109.0",
"uvicorn[standard]>=0.27.0",
"boto3>=1.34.0",
"aws-opentelemetry-distro>=0.10.1",
]
view raw pyproject.toml hosted with ❤ by GitHub

Build and deploy:

# Build for ARM64 (required by AgentCore)
docker buildx build –platform linux/arm64 -t aws-cost-agent:latest .
# Test locally
docker run -p 8080:8080 \
-e AWS_ACCESS_KEY_ID="$AWS_ACCESS_KEY_ID" \
-e AWS_SECRET_ACCESS_KEY="$AWS_SECRET_ACCESS_KEY" \
-e AWS_REGION="us-east-1" \
aws-cost-agent:latest
# Push to ECR
aws ecr get-login-password –region us-east-1 | \
docker login –username AWS –password-stdin $ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com
docker tag aws-cost-agent:latest $ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/aws-cost-agent:latest
docker push $ACCOUNT_ID.dkr.ecr.us-east-1.amazonaws.com/aws-cost-agent:latest

Create AgentCore Runtime with Terraform

# agentcore.tf
resource "aws_iam_role" "agent_runtime" {
name = "AgentCoreRuntime"
assume_role_policy = jsonencode({
Version = "2012-10-17"
Statement = [{
Action = "sts:AssumeRole"
Effect = "Allow"
Principal = {
Service = "bedrock-agentcore.amazonaws.com"
}
}]
})
}
resource "aws_iam_role_policy" "agent_permissions" {
name = "AgentPermissions"
role = aws_iam_role.agent_runtime.id
policy = jsonencode({
Version = "2012-10-17"
Statement = [
{
Effect = "Allow"
Action = [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"ce:GetCostAndUsage",
"ec2:DescribeInstances"
]
Resource = "*"
},
{
Effect = "Allow"
Action = [
"logs:CreateLogGroup",
"logs:CreateLogStream",
"logs:PutLogEvents"
]
Resource = "*"
}
]
})
}
# Note: AgentCore resources may need boto3 deployment
# as Terraform provider support is still evolving
view raw agentcore.tf hosted with ❤ by GitHub

Invoking Your Deployed Agent

import boto3
import json
def invoke_agent(prompt: str, session_id: str = None):
"""Invoke the deployed agent."""
client = boto3.client('bedrock-agentcore', region_name='us-east-1')
if not session_id:
import uuid
session_id = str(uuid.uuid4())
response = client.invoke_agent_runtime(
agentRuntimeArn='arn:aws:bedrock-agentcore:us-east-1:123456789:runtime/aws-cost-agent',
runtimeSessionId=session_id,
payload=json.dumps({
"input": {"prompt": prompt}
}),
qualifier="DEFAULT"
)
result = json.loads(response['payload'].read())
return result
# Use it
response = invoke_agent("What's my AWS spending this month?")
print(response['output']['message'])
view raw invoke-agent.py hosted with ❤ by GitHub

Observability with CloudWatch

AgentCore integrates with CloudWatch Transaction Search for tracing:

  1. Enable Transaction Search in CloudWatch console
  2. Add OpenTelemetry to your agent:
# In your Dockerfile CMD
CMD ["opentelemetry-instrument", "uvicorn", "agent:app", "–host", "0.0.0.0", "–port", "8080"]
  1. View traces in CloudWatch Application Signals

Key metrics to monitor:

  • Invocation latency
  • Tool execution time
  • Token usage
  • Error rates

Best Practices

1. Tool Design

@tool
def good_tool(specific_param: str, optional_param: int = 10) -> dict:
"""Clear description of what this tool does.
Args:
specific_param: Exactly what this parameter expects
optional_param: What this controls (default: 10)
Returns:
Dictionary with well-defined structure
"""
# Implementation with error handling
try:
result = do_something(specific_param)
return {"status": "success", "data": result}
except Exception as e:
return {"status": "error", "message": str(e)}
view raw tool-design.py hosted with ❤ by GitHub

2. Error Handling

@app.post("/invocations")
async def invoke_agent(request: InvocationRequest):
try:
result = agent(request.input.get("prompt", ""))
return {"output": {"message": result.message}}
except boto3.exceptions.Boto3Error as e:
# AWS-specific errors
raise HTTPException(status_code=503, detail=f"AWS error: {str(e)}")
except Exception as e:
# Log for debugging
logger.exception("Agent invocation failed")
raise HTTPException(status_code=500, detail="Internal error")

3. Session Management

# Use consistent session IDs for conversation continuity
from strands import Agent
from strands.memory import ConversationMemory
agent = Agent(
system_prompt="…",
memory=ConversationMemory(max_messages=20)
)
# Each session maintains its own context
session_agents = {}
def get_agent_for_session(session_id: str) -> Agent:
if session_id not in session_agents:
session_agents[session_id] = Agent(
system_prompt="…",
memory=ConversationMemory()
)
return session_agents[session_id]

Cost Considerations

Component Pricing
Bedrock AgentCore Per invocation + compute time
Bedrock LLM (Claude 3.5 Sonnet) $3/M input, $15/M output tokens
ECR Storage ~$0.10/GB/month
CloudWatch Logs $0.50/GB ingested

Optimization tips:

  • Use Claude Instant for simple tasks
  • Implement response caching
  • Set appropriate timeouts
  • Monitor token usage

What’s Next?

  • Multi-agent systems – Agents that collaborate
  • Knowledge bases – RAG with Bedrock Knowledge Bases
  • Guardrails – Content filtering and safety
  • Custom models – Fine-tuned models for specific domains

Have you built agents with Strands? Share your use cases in the comments!


About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor. He specializes in AWS architecture and generative AI applications.

Deja un comentario