Cofee, tech and community

[
[
[

]
]
]

Amazon Bedrock: From Zero to Production in 30 Minutes

If you’ve been curious about Generative AI but haven’t dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.

In this guide, I’ll take you from zero to a working application that you can actually deploy to production.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as «LLMs as a Service.»

Available models include:

  • Claude 4 & Claude 3.5 (Anthropic) – Best for complex reasoning and long documents
  • Titan (Amazon) – Cost-effective for general tasks
  • Llama 3 (Meta) – Open-source performance
  • Mistral Large – Fast inference, great for code and chat
  • Stable Diffusion 3 (Stability AI) – Image generation

Setting Up Your Environment

1. Enable Bedrock Models

First, request access to the models you want to use:

  1. Go to Amazon Bedrock in the AWS Console
  2. Navigate to «Model access»
  3. Click «Manage model access»
  4. Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)
  5. Submit the request

Most models are approved instantly. Some (like Claude 4) may take a few minutes.

2. Configure IAM Permissions

{
"Version": "2012-10-17",
"Statement": [
{
"Effect": "Allow",
"Action": [
"bedrock:InvokeModel",
"bedrock:InvokeModelWithResponseStream"
],
"Resource": [
"arn:aws:bedrock:*::foundation-model/anthropic.claude-*",
"arn:aws:bedrock:*::foundation-model/amazon.titan*"
]
}
]
}

3. Install Dependencies

pip install boto3 langchain-aws
view raw install-deps.sh hosted with ❤ by GitHub

Your First Bedrock Application

Let’s build a simple text generator:

import boto3
import json
# Initialize the client
bedrock = boto3.client(
service_name='bedrock-runtime',
region_name='us-east-1'
)
def generate_text(prompt: str, max_tokens: int = 500) -> str:
"""Generate text using Claude 3.5 Sonnet."""
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": max_tokens,
"messages": [
{
"role": "user",
"content": prompt
}
]
})
response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
contentType="application/json",
accept="application/json"
)
response_body = json.loads(response['body'].read())
return response_body['content'][0]['text']
# Test it
result = generate_text("Explain Kubernetes in 3 sentences for a beginner.")
print(result)

Output:

Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.

Streaming Responses

For better user experience, stream the response:

def generate_text_streaming(prompt: str):
"""Stream text generation for real-time output."""
body = json.dumps({
"anthropic_version": "bedrock-2023-05-31",
"max_tokens": 1000,
"messages": [{"role": "user", "content": prompt}]
})
response = bedrock.invoke_model_with_response_stream(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
contentType="application/json"
)
for event in response['body']:
chunk = json.loads(event['chunk']['bytes'])
if chunk['type'] == 'content_block_delta':
yield chunk['delta'].get('text', '')
# Use it
for text_chunk in generate_text_streaming("Write a haiku about cloud computing"):
print(text_chunk, end='', flush=True)

Using LangChain for Production Apps

For more complex applications, LangChain provides a cleaner interface:

from langchain_aws import ChatBedrock
from langchain_core.messages import HumanMessage, SystemMessage
# Initialize the model
llm = ChatBedrock(
model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
model_kwargs={
"max_tokens": 1000,
"temperature": 0.7
}
)
# Simple chat
response = llm.invoke([
SystemMessage(content="You are a helpful AWS architect."),
HumanMessage(content="What's the best way to set up a VPC?")
])
print(response.content)

Building a RAG Application

Retrieval-Augmented Generation (RAG) lets you query your own documents:

from langchain_aws import BedrockEmbeddings
from langchain_community.vectorstores import FAISS
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
# 1. Initialize embeddings
embeddings = BedrockEmbeddings(
model_id="amazon.titan-embed-text-v1"
)
# 2. Load and split your documents
documents = […] # Your documents here
text_splitter = RecursiveCharacterTextSplitter(
chunk_size=1000,
chunk_overlap=200
)
splits = text_splitter.split_documents(documents)
# 3. Create vector store
vectorstore = FAISS.from_documents(splits, embeddings)
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# 4. Create RAG chain
template = """Answer based on the following context:
Context: {context}
Question: {question}
Answer:"""
prompt = ChatPromptTemplate.from_template(template)
rag_chain = (
{"context": retriever, "question": RunnablePassthrough()}
| prompt
| llm
)
# 5. Query your documents
answer = rag_chain.invoke("What is our refund policy?")
print(answer.content)
view raw bedrock-rag.py hosted with ❤ by GitHub

Cost Optimization Tips

Bedrock pricing is based on input/output tokens. Here’s how to optimize:

1. Choose the Right Model

Use Case Recommended Model Cost
Simple Q&A Titan Lite $
General chat Claude 3.5 Haiku $$
Complex reasoning Claude 3.5 Sonnet $$$
Advanced code & reasoning Claude 4 Sonnet/Opus $$$$

2. Use Provisioned Throughput for High Volume

# For production workloads with consistent traffic
response = bedrock.invoke_model(
modelId="arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model",
body=body
)

3. Cache Frequent Responses

import hashlib
from functools import lru_cache
@lru_cache(maxsize=1000)
def cached_generate(prompt_hash: str, prompt: str) -> str:
return generate_text(prompt)
def generate_with_cache(prompt: str) -> str:
prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
return cached_generate(prompt_hash, prompt)

Security Best Practices

1. Use VPC Endpoints

resource "aws_vpc_endpoint" "bedrock" {
vpc_id = aws_vpc.main.id
service_name = "com.amazonaws.us-east-1.bedrock-runtime"
vpc_endpoint_type = "Interface"
subnet_ids = aws_subnet.private[*].id
security_group_ids = [aws_security_group.bedrock_endpoint.id]
private_dns_enabled = true
}
view raw vpc-endpoint.tf hosted with ❤ by GitHub

2. Enable Model Invocation Logging

# CloudWatch logging for compliance
bedrock_client = boto3.client('bedrock')
bedrock_client.put_model_invocation_logging_configuration(
loggingConfig={
'cloudWatchConfig': {
'logGroupName': '/aws/bedrock/invocations',
'roleArn': 'arn:aws:iam::123456789:role/BedrockLogging'
},
'textDataDeliveryEnabled': True,
'imageDataDeliveryEnabled': False
}
)

3. Use Guardrails

Amazon Bedrock Guardrails help filter harmful content:

response = bedrock.invoke_model(
modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
body=body,
guardrailIdentifier="my-guardrail-id",
guardrailVersion="DRAFT"
)

Real-World Architecture

Here’s a production-ready architecture I use for enterprise clients:

Layer Components
Frontend CloudFront → API Gateway
Compute Lambda (Chat) | Lambda (RAG) | Lambda (Streaming)
Backend Bedrock (Foundation Models) | OpenSearch (Vector) | DynamoDB (Sessions)

What’s Next?

Now that you have the basics, here are some directions to explore:

  1. Agents for Bedrock – Create autonomous agents that can use tools
  2. Knowledge Bases – Managed RAG with automatic chunking and embeddings
  3. Fine-tuning – Customize models with your own data
  4. Multi-modal – Work with images and PDFs using Claude 4 Vision

Have questions about implementing Bedrock in your architecture? Drop a comment below!


About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.

Deja un comentario