Amazon Bedrock: From Zero to Production in 30 Minutes

If you’ve been curious about Generative AI but haven’t dived in yet, Amazon Bedrock is the easiest way to start. No model training, no GPU management, no ML expertise required—just API calls to state-of-the-art foundation models.

In this guide, I’ll take you from zero to a working application that you can actually deploy to production.

What is Amazon Bedrock?

Amazon Bedrock is a fully managed service that provides access to foundation models (FMs) from leading AI companies through a unified API. Think of it as “LLMs as a Service.”

Available models include:

Claude 4 & Claude 3.5 (Anthropic) – Best for complex reasoning and long documents
Titan (Amazon) – Cost-effective for general tasks
Llama 3 (Meta) – Open-source performance
Mistral Large – Fast inference, great for code and chat
Stable Diffusion 3 (Stability AI) – Image generation

Setting Up Your Environment

1. Enable Bedrock Models

First, request access to the models you want to use:

Go to Amazon Bedrock in the AWS Console
Navigate to “Model access”
Click “Manage model access”
Select the models you need (I recommend starting with Claude 3.5 Sonnet or Claude 4 Sonnet)
Submit the request

Most models are approved instantly. Some (like Claude 4) may take a few minutes.

2. Configure IAM Permissions

	{
	"Version": "2012-10-17",
	"Statement": [
	{
	"Effect": "Allow",
	"Action": [
	"bedrock:InvokeModel",
	"bedrock:InvokeModelWithResponseStream"
	],
	"Resource": [
	"arn:aws:bedrock:::foundation-model/anthropic.claude-",
	"arn:aws:bedrock:::foundation-model/amazon.titan"
	]
	}
	]
	}

view raw iam-bedrock-policy.json hosted with ❤ by GitHub

3. Install Dependencies

pip install boto3 langchain-aws

view raw install-deps.sh hosted with ❤ by GitHub

Your First Bedrock Application

Let’s build a simple text generator:

	import boto3
	import json

	# Initialize the client
	bedrock = boto3.client(
	service_name='bedrock-runtime',
	region_name='us-east-1'
	)

	def generate_text(prompt: str, max_tokens: int = 500) -> str:
	"""Generate text using Claude 3.5 Sonnet."""

	body = json.dumps({
	"anthropic_version": "bedrock-2023-05-31",
	"max_tokens": max_tokens,
	"messages": [
	{
	"role": "user",
	"content": prompt
	}
	]
	})

	response = bedrock.invoke_model(
	modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
	body=body,
	contentType="application/json",
	accept="application/json"
	)

	response_body = json.loads(response['body'].read())
	return response_body['content'][0]['text']


	# Test it
	result = generate_text("Explain Kubernetes in 3 sentences for a beginner.")
	print(result)

view raw bedrock-basic.py hosted with ❤ by GitHub

Output:

Kubernetes is a system that helps you run and manage applications in containers
across multiple computers automatically. It handles tasks like starting your
applications, restarting them if they crash, and distributing traffic between
them. Think of it as an automated IT team that keeps your applications running
24/7 without manual intervention.

Streaming Responses

For better user experience, stream the response:

	def generate_text_streaming(prompt: str):
	"""Stream text generation for real-time output."""

	body = json.dumps({
	"anthropic_version": "bedrock-2023-05-31",
	"max_tokens": 1000,
	"messages": [{"role": "user", "content": prompt}]
	})

	response = bedrock.invoke_model_with_response_stream(
	modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
	body=body,
	contentType="application/json"
	)

	for event in response['body']:
	chunk = json.loads(event['chunk']['bytes'])
	if chunk['type'] == 'content_block_delta':
	yield chunk['delta'].get('text', '')


	# Use it
	for text_chunk in generate_text_streaming("Write a haiku about cloud computing"):
	print(text_chunk, end='', flush=True)

view raw bedrock-streaming.py hosted with ❤ by GitHub

Using LangChain for Production Apps

For more complex applications, LangChain provides a cleaner interface:

	from langchain_aws import ChatBedrock
	from langchain_core.messages import HumanMessage, SystemMessage

	# Initialize the model
	llm = ChatBedrock(
	model_id="anthropic.claude-3-5-sonnet-20241022-v2:0",
	model_kwargs={
	"max_tokens": 1000,
	"temperature": 0.7
	}
	)

	# Simple chat
	response = llm.invoke([
	SystemMessage(content="You are a helpful AWS architect."),
	HumanMessage(content="What's the best way to set up a VPC?")
	])
	print(response.content)

view raw langchain-bedrock.py hosted with ❤ by GitHub

Building a RAG Application

Retrieval-Augmented Generation (RAG) lets you query your own documents:

	from langchain_aws import BedrockEmbeddings
	from langchain_community.vectorstores import FAISS
	from langchain.text_splitter import RecursiveCharacterTextSplitter
	from langchain_core.prompts import ChatPromptTemplate
	from langchain_core.runnables import RunnablePassthrough

	# 1. Initialize embeddings
	embeddings = BedrockEmbeddings(
	model_id="amazon.titan-embed-text-v1"
	)

	# 2. Load and split your documents
	documents = […] # Your documents here
	text_splitter = RecursiveCharacterTextSplitter(
	chunk_size=1000,
	chunk_overlap=200
	)
	splits = text_splitter.split_documents(documents)

	# 3. Create vector store
	vectorstore = FAISS.from_documents(splits, embeddings)
	retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

	# 4. Create RAG chain
	template = """Answer based on the following context:

	Context: {context}

	Question: {question}

	Answer:"""

	prompt = ChatPromptTemplate.from_template(template)

	rag_chain = (
	{"context": retriever, "question": RunnablePassthrough()}
	\| prompt
	\| llm
	)

	# 5. Query your documents
	answer = rag_chain.invoke("What is our refund policy?")
	print(answer.content)

view raw bedrock-rag.py hosted with ❤ by GitHub

Cost Optimization Tips

Bedrock pricing is based on input/output tokens. Here’s how to optimize:

1. Choose the Right Model

Use Case	Recommended Model	Cost
Simple Q&A	Titan Lite	$
General chat	Claude 3.5 Haiku	$$
Complex reasoning	Claude 3.5 Sonnet	$$$
Advanced code & reasoning	Claude 4 Sonnet/Opus	$$$$

2. Use Provisioned Throughput for High Volume

	# For production workloads with consistent traffic
	response = bedrock.invoke_model(
	modelId="arn:aws:bedrock:us-east-1:123456789:provisioned-model/my-model",
	body=body
	)

view raw bedrock-provisioned.py hosted with ❤ by GitHub

3. Cache Frequent Responses

	import hashlib
	from functools import lru_cache

	@lru_cache(maxsize=1000)
	def cached_generate(prompt_hash: str, prompt: str) -> str:
	return generate_text(prompt)

	def generate_with_cache(prompt: str) -> str:
	prompt_hash = hashlib.md5(prompt.encode()).hexdigest()
	return cached_generate(prompt_hash, prompt)

view raw bedrock-cache.py hosted with ❤ by GitHub

Security Best Practices

1. Use VPC Endpoints

	resource "aws_vpc_endpoint" "bedrock" {
	vpc_id = aws_vpc.main.id
	service_name = "com.amazonaws.us-east-1.bedrock-runtime"
	vpc_endpoint_type = "Interface"
	subnet_ids = aws_subnet.private[*].id
	security_group_ids = [aws_security_group.bedrock_endpoint.id]
	private_dns_enabled = true
	}

view raw vpc-endpoint.tf hosted with ❤ by GitHub

2. Enable Model Invocation Logging

	# CloudWatch logging for compliance
	bedrock_client = boto3.client('bedrock')

	bedrock_client.put_model_invocation_logging_configuration(
	loggingConfig={
	'cloudWatchConfig': {
	'logGroupName': '/aws/bedrock/invocations',
	'roleArn': 'arn:aws:iam::123456789:role/BedrockLogging'
	},
	'textDataDeliveryEnabled': True,
	'imageDataDeliveryEnabled': False
	}
	)

view raw bedrock-logging.py hosted with ❤ by GitHub

3. Use Guardrails

Amazon Bedrock Guardrails help filter harmful content:

	response = bedrock.invoke_model(
	modelId="anthropic.claude-3-5-sonnet-20241022-v2:0",
	body=body,
	guardrailIdentifier="my-guardrail-id",
	guardrailVersion="DRAFT"
	)

view raw bedrock-guardrails.py hosted with ❤ by GitHub

Real-World Architecture

Here’s a production-ready architecture I use for enterprise clients:

Layer	Components
Frontend	CloudFront → API Gateway
Compute	Lambda (Chat) \| Lambda (RAG) \| Lambda (Streaming)
Backend	Bedrock (Foundation Models) \| OpenSearch (Vector) \| DynamoDB (Sessions)

What’s Next?

Now that you have the basics, here are some directions to explore:

Agents for Bedrock – Create autonomous agents that can use tools
Knowledge Bases – Managed RAG with automatic chunking and embeddings
Fine-tuning – Customize models with your own data
Multi-modal – Work with images and PDFs using Claude 4 Vision

Have questions about implementing Bedrock in your architecture? Drop a comment below!

About the author: David Petrocelli is a Senior Cloud Architect at Caylent, PhD in Computer Science, and University Professor specializing in cloud architecture and generative AI applications.

Amazon Bedrock: From Zero to Production in 30 Minutes

Amazon Bedrock: From Zero to Production in 30 Minutes

What is Amazon Bedrock?

Setting Up Your Environment

1. Enable Bedrock Models

2. Configure IAM Permissions

3. Install Dependencies

Your First Bedrock Application

Streaming Responses

Using LangChain for Production Apps

Building a RAG Application

Cost Optimization Tips

1. Choose the Right Model

2. Use Provisioned Throughput for High Volume

3. Cache Frequent Responses

Security Best Practices

1. Use VPC Endpoints

2. Enable Model Invocation Logging

3. Use Guardrails

Real-World Architecture

What’s Next?

Share this:

Leave a comment Cancel reply