How to Build a Production RAG System on AWS From Scratch (Complete Beginner's Guide)

RAG (Retrieval-Augmented Generation) lets AI answer questions using YOUR organisation's documents, not just what it was trained on. This guide teaches you to build a production-ready RAG system on AWS Bedrock from scratch. No ML experience needed. Every command included. The Problem RAG Solves , And Why Every Organisation Needs It Here is a scenario that plays out in every organisation. A new employee joins your company. They have a question: "What is our policy on expense reimbursements?" They search the internal wiki. They get 47 results. They ask a colleague. The colleague is not sure and points them to a SharePoint folder with 200 documents. They spend 45 minutes reading through PDFs before finding the answer buried in paragraph 12 of a document called HR-Policy-V3-FINAL-updated-2024.pdf . Now multiply that by every new employee, every contractor, every team member who needs to find information they know exists somewhere. McKinsey estimates employees spend an average of 2.5 hours per day searching for information. In an organisation of 500 people, that is 1,250 hours of lost productivity every single day. RAG fixes this. With a properly built RAG system, that same employee types: "What is our expense reimbursement policy?" and gets an accurate, cited answer in under 3 seconds, drawn directly from your actual policy documents. This is not theoretical. This is deployed and working at enterprises globally right now. And by the end of this article, you will have built exactly this for your organisation. What Is RAG? (Explained Simply) RAG stands for Retrieval-Augmented Generation . The name sounds complicated. The concept is simple. A standard AI model (like Claude or GPT-4) knows only what it was trained on, information up to its training cutoff date, from public sources on the internet. It knows nothing about your company's internal documents, your products, your policies, or your customers. RAG solves this by adding a retrieval step before the AI generates an answer: WITHOUT RAG: User asks question → AI answers from training data only Problem: AI knows nothing about your organisation WITH RAG: User asks question ↓ Search your documents for relevant content ↓ Give relevant content + question to AI ↓ AI answers using YOUR documents Result: Accurate answers grounded in your actual information The AI does not guess. It reads the relevant part of your document and answers based on what it finds. If the answer is not in your documents, it says so rather than making something up. What We Are Building A complete, production-ready RAG system that: Ingests your documents (PDFs, Word files, text files) from S3 Chunks and embeds them into a searchable vector knowledge base Accepts natural language questions via an API Retrieves the most relevant document sections Generates accurate answers with citations showing which document the answer came from Runs serverlessly on AWS, no servers to manage Architecture: Your Documents (PDF, Word, TXT) ↓ Amazon S3 (document storage) ↓ Bedrock Knowledge Base - Chunks documents into sections - Embeds each section into vectors - Stores vectors in OpenSearch Serverless ↓ Query API (Lambda + API Gateway) ↓ User gets answer + citations AWS services used: Amazon S3 : stores your documents Amazon Bedrock Knowledge Bases : managed RAG (chunking and embedding and retrieval) Amazon OpenSearch Serverless : vector database (created automatically by Bedrock) AWS Lambda : handles queries and formats responses Amazon API Gateway : gives the Lambda an HTTPS endpoint Amazon Titan Embeddings : converts text to vectors for search What you need: AWS account (free tier, note OpenSearch Serverless costs ~$0.24/hour when active) AWS CLI configured Some PDF or text documents to test with About 90 minutes Part 1: Understanding the Key Concepts Before writing a single command, let us understand the three concepts that make RAG work. You do not need to understand the math, just what each step does. Concept 1: Chunking Your documents are too long to fit in a single AI prompt. A 50-page policy document might be 25,000 words. AI models have context limits (how much text they can process at once), and more importantly, sending 50 pages for every question is expensive and slow. Chunking splits your documents into smaller pieces, typically 300–500 words each, with a small overlap between chunks so no sentence loses its context at a boundary. Original document (50 pages): "Section 1: Introduction... Section 2: Policy... Section 3: Procedures..." After chunking (each ~400 words with 50-word overlap): Chunk 1: "Section 1: Introduction... [first 400 words]" Chunk 2: "[last 50 words of chunk 1]... Section 2: Policy... [next 350 words]" Chunk 3: "[last 50 words of chunk 2]... [next 400 words]" ...and so on Concept 2: Embeddings Once your document is chunked, each chunk is converted into a vector, a list of numbers that represents its meaning mathematically. The magic is that text with similar meaning produces similar vectors, even if the words are different. So the chunk about "expense reimbursement policy" will have a vector close to the question "how do I get reimbursed for travel costs", even though those exact words do not appear together. This is what makes semantic search possible: finding relevant content by meaning, not just keyword matching. Concept 3: Retrieval When a user asks a question, the question is also converted to a vector. The system then searches the vector database for the chunks whose vectors are closest to the question vector, these are the most semantically relevant chunks. The top 3-5 chunks are retrieved and included in the prompt sent to the AI model. Question: "How do I claim expenses for a business trip?" ↓ Question → vector → [0.234, -0.891, 0.127, ...] ↓ Search vector DB for similar vectors ↓ Top 3 matching chunks retrieved: - Chunk from HR-Policy.pdf: "Section 4.2: Travel Expense Claims..." - Chunk from Finance-Guide.pdf: "Business Travel Reimbursement..." - Chunk from FAQ.pdf: "Q: What receipts do I need for expense claims..." ↓ AI reads these 3 chunks + question → generates answer with citations Now you understand RAG. Let us build it. Part 2: Prepare Your Documents Step 1: Create the S3 Bucket bash # Set your variables — change these to match your setup REGION="eu-west-1" ACCOUNT_ID=$(aws sts get-caller-identity --query Account --output text) BUCKET_NAME="my-rag-documents-$(date +%s)" # Create the S3 bucket aws s3 mb s3://$BUCKET_NAME --region $REGION # Block all public access — documents should never be public aws s3api put-public-access-block \ --bucket $BUCKET_NAME \ --public-access-block-configuration \ BlockPublicAcls=true,IgnorePublicAcls=true,\ BlockPublicPolicy=true,RestrictPublicBuckets=true echo "S3 bucket created: $BUCKET_NAME" echo "Save this: export BUCKET_NAME=$BUCKET_NAME" Step 2: Upload Your Documents If you have your own documents (PDFs, Word files, text files), upload them now. If not, create some sample documents to test with: bash # Create sample documents if you do not have your own mkdir -p sample-docs cat > sample-docs/expense-policy.txt << 'EOF' EXPENSE REIMBURSEMENT POLICY Last Updated: January 2026 1. OVERVIEW This policy governs the reimbursement of business expenses incurred by employees in the course of their work duties. All expenses must be pre-approved where indicated and submitted within 30 days of being incurred. 2. ELIGIBLE EXPENSES The following expenses are eligible for reimbursement: - Business travel (flights, trains, taxis to/from client sites) - Accommodation (up to £150 per night in London, £100 elsewhere in the UK) - Business meals (up to £50 per person, must have 2+ attendees) - Client entertainment (pre-approval required, up to £100 per person) - Home office equipment (pre-approval required for items over £200) - Professional development courses (pre-approval required) 3. HOW TO SUBMIT CLAIMS All expense claims must be submitted through the Expenses portal at expenses.company.internal within 30 days of the expense being incurred. Required documentation: - Original receipts for all expenses over £10 - Business justification for each expense - Names of attendees for meals and entertainment - Manager approval for expenses over £500 4. PAYMENT TIMELINE Approved expenses are reimbursed in the next monthly payroll run, provided the claim is submitted by the 15th of the month. Claims submitted after the 15th will be processed the following month. 5. INELIGIBLE EXPENSES The following will not be reimbursed: - Personal travel or accommodation - Alcohol consumed outside of approved client entertainment - Fines, penalties, or legal fees - Personal mobile phone contracts (BYOD allowance is separate) - First-class travel without VP-level approval EOF cat > sample-docs/remote-work-policy.txt << 'EOF' REMOTE WORK POLICY Last Updated: March 2026 1. ELIGIBILITY All permanent employees who have completed their 3-month probationary period are eligible for remote work arrangements. Contractors and temporary staff require manager approval on a case-by-case basis. 2. HYBRID WORK ARRANGEMENT The company operates a hybrid model requiring employees to be in the office: - Minimum 3 days per week for team members - Minimum 2 days per week for senior individual contributors - As required for managers (typically 4 days per week) Office days must include Tuesday and Wednesday (core collaboration days). 3. HOME OFFICE REQUIREMENTS Employees working remotely must have: - A dedicated workspace free from significant distractions - Reliable broadband connection (minimum 25 Mbps download) - Company-issued laptop (personal devices not permitted for security reasons) - A webcam and headset suitable for video calls 4. EQUIPMENT AND EXPENSES The company provides: - Laptop and peripherals (mouse, keyboard) upon joining - £400 home office setup allowance (one-time, claim through expenses) - £30 per month broadband contribution (add to monthly expenses) Employees are responsible for their own desk and chair. 5. AVAILABILITY REQUIREMENTS Remote employees must: - Be available during core hours: 9am-5pm in their local timezone - Respond to messages within 2 hours during working hours - Attend all required meetings with camera on unless exceptional circumstances - Notify their manager in advance if unavailable during core hours EOF cat > sample-docs/annual-leave-policy.txt << 'EOF' ANNUAL LEAVE POLICY Last Updated: February 2026 1. ENTITLEMENT Full-time permanent employees receive: - 25 days annual leave per year (pro-rated for part-time employees) - 8 UK bank holidays (fixed days off) - 1 additional day for each year of service, up to 5 additional days - Birthday leave (1 day, to be taken within the birthday month) 2. HOW TO REQUEST LEAVE Annual leave must be requested through the HR portal at hr.company.internal. Notice requirements: - Up to 3 days: minimum 1 week notice - 4-9 days: minimum 2 weeks notice - 10+ consecutive days: minimum 4 weeks notice Leave requests are subject to manager approval and team capacity. 3. CARRY OVER Up to 5 days of unused annual leave may be carried over to the following year. Carried-over leave must be used by 31 March of the following year or it is forfeited. Employees may purchase up to 5 additional days of leave per year through salary sacrifice (request by 1 December for the following year). 4. SICKNESS DURING ANNUAL LEAVE If an employee falls ill during a period of annual leave and provides a medical certificate, the days of illness may be recredited as annual leave. The employee must notify their manager on the first day of illness. 5. LEAVING THE COMPANY On leaving the company, employees will be paid for any unused annual leave accrued in the current leave year. Employees who have taken more leave than accrued will have the excess deducted from their final salary. EOF # Upload documents to S3 aws s3 cp sample-docs/ s3://$BUCKET_NAME/documents/ --recursive echo "Documents uploaded:" aws s3 ls s3://$BUCKET_NAME/documents/ Part 3: Create the Bedrock Knowledge Base This is the core of the RAG system. Bedrock Knowledge Bases handles everything: chunking, embedding, and storing your documents in a searchable vector database. Step 1: Create the IAM Role for Bedrock Bedrock needs permission to read from your S3 bucket. bash # Create trust policy for Bedrock cat > bedrock-trust-policy.json << 'EOF' { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": { "Service": "bedrock.amazonaws.com" }, "Action": "sts:AssumeRole", "Condition": { "StringEquals": { "aws:SourceAccount": "YOUR_ACCOUNT_ID" } } }] } EOF # Replace placeholder with actual account ID sed -i "s/YOUR_ACCOUNT_ID/$ACCOUNT_ID/g" bedrock-trust-policy.json # Create the role aws iam create-role \ --role-name bedrock-knowledge-base-role \ --assume-role-policy-document file://bedrock-trust-policy.json # Create permissions policy cat > bedrock-kb-policy.json << EOF { "Version": "2012-10-17", "Statement": [ { "Effect": "Allow", "Action": [ "s3:GetObject", "s3:ListBucket" ], "Resource": [ "arn:aws:s3:::$BUCKET_NAME", "arn:aws:s3:::$BUCKET_NAME/*" ] }, { "Effect": "Allow", "Action": [ "bedrock:InvokeModel" ], "Resource": "arn:aws:bedrock:$REGION::foundation-model/amazon.titan-embed-text-v2:0" }, { "Effect": "Allow", "Action": [ "aoss:APIAccessAll" ], "Resource": "*" } ] } EOF aws iam put-role-policy \ --role-name bedrock-knowledge-base-role \ --policy-name bedrock-kb-permissions \ --policy-document file://bedrock-kb-policy.json KB_ROLE_ARN=$(aws iam get-role \ --role-name bedrock-knowledge-base-role \ --query 'Role.Arn' \ --output text) echo "Bedrock role ARN: $KB_ROLE_ARN" Step 2: Create the Knowledge Base via AWS Console The Knowledge Base creation is easiest via the console because it automatically sets up OpenSearch Serverless for you: 1. Open AWS Console → Amazon Bedrock → Knowledge Bases → Create knowledge base 2. Knowledge base details: Name: company-knowledge-base Description: Internal company policies and documentation IAM Role: bedrock-knowledge-base-role (select the one you created) 3. Data source: Type: Amazon S3 S3 URI: s3://YOUR_BUCKET_NAME/documents/ Name: company-documents 4. Embeddings model: Select: Titan Text Embeddings V2 (This converts your text into vectors) 5. Vector store: Select: Quick create a new vector store Type: Amazon OpenSearch Serverless (Bedrock creates and configures this automatically) 6. Review and create → Create knowledge base Wait 3-5 minutes for creation to complete. Get the Knowledge Base ID: bash KB_ID=$(aws bedrock-agent list-knowledge-bases \ --region $REGION \ --query 'knowledgeBaseSummaries[?name==`company-knowledge-base`].knowledgeBaseId' \ --output text) echo "Knowledge Base ID: $KB_ID" echo "Save this: export KB_ID=$KB_ID" Step 3: Sync Your Documents bash # Get the data source ID DATA_SOURCE_ID=$(aws bedrock-agent list-data-sources \ --knowledge-base-id $KB_ID \ --region $REGION \ --query 'dataSourceSummaries[0].dataSourceId' \ --output text) echo "Data Source ID: $DATA_SOURCE_ID" # Start the ingestion job (chunks, embeds, and indexes your documents) INGESTION_JOB_ID=$(aws bedrock-agent start-ingestion-job \ --knowledge-base-id $KB_ID \ --data-source-id $DATA_SOURCE_ID \ --region $REGION \ --query 'ingestionJob.ingestionJobId' \ --output text) echo "Ingestion job started: $INGESTION_JOB_ID" echo "Waiting for ingestion to complete..." # Poll until complete while true; do STATUS=$(aws bedrock-agent get-ingestion-job \ --knowledge-base-id $KB_ID \ --data-source-id $DATA_SOURCE_ID \ --ingestion-job-id $INGESTION_JOB_ID \ --region $REGION \ --query 'ingestionJob.status' \ --output text) echo "Status: $STATUS" if [ "$STATUS" = "COMPLETE" ]; then echo "Ingestion complete. Your documents are now searchable." break elif [ "$STATUS" = "FAILED" ]; then echo "Ingestion failed. Check the AWS console for details." exit 1 fi sleep 15 done Part 4: Test the Knowledge Base Directly Before building the API, test that the knowledge base works: bash # Test: ask a question directly using the AWS CLI aws bedrock-agent-runtime retrieve-and-generate \ --region $REGION \ --input '{"text": "What is the expense reimbursement policy for business travel?"}' \ --retrieve-and-generate-configuration "{ \"type\": \"KNOWLEDGE_BASE\", \"knowledgeBaseConfiguration\": { \"knowledgeBaseId\": \"$KB_ID\", \"modelArn\": \"arn:aws:bedrock:$REGION::foundation-model/anthropic.claude-3-haiku-20240307-v1:0\" } }" | python3 -m json.tool You should see an answer like: json { "output": { "text": "For business travel, you can claim reimbursement for flights, trains, and taxis to client sites. Accommodation is reimbursed up to £150 per night in London and £100 per night elsewhere in the UK. All claims must be submitted within 30 days through the Expenses portal." }, "citations": [ { "retrievedReferences": [ { "content": { "text": "Business travel (flights, trains, taxis to/from client sites)..." }, "location": { "s3Location": { "uri": "s3://your-bucket/documents/expense-policy.txt" } } } ] } ] } The citation shows exactly which document the answer came from. This is one of the most valuable features of RAG, your users can verify the source. Part 5: Build the Query Lambda Function Now we wrap the knowledge base in a Lambda function that handles validation, formats responses cleanly, and logs everything. python # rag_handler.py # Production RAG query handler import boto3 import json import logging import os import time from datetime import datetime, timezone logger = logging.getLogger() logger.setLevel(logging.INFO) # Clients — initialised outside handler for warm reuse bedrock_agent = boto3.client('bedrock-agent-runtime', region_name='eu-west-1') # Configuration from environment variables KNOWLEDGE_BASE_ID = os.environ.get('KNOWLEDGE_BASE_ID', '') MODEL_ARN = f"arn:aws:bedrock:eu-west-1::foundation-model/anthropic.claude-3-haiku-20240307-v1:0" MAX_QUESTION_LENGTH = 1000 MIN_QUESTION_LENGTH = 3 NUM_RESULTS = 5 # How many document chunks to retrieve def log_event(level: str, event: str, **kwargs): """Structured JSON logging for CloudWatch""" entry = { "level": level, "event": event, "timestamp": datetime.now(timezone.utc).isoformat(), **kwargs } getattr(logger, level.lower(), logger.info)(json.dumps(entry)) def validate_question(question: str) -> tuple[bool, str]: """Validate the user's question""" if not question or not isinstance(question, str): return False, "question must be a non-empty string" question = question.strip() if len(question) < MIN_QUESTION_LENGTH: return False, f"question must be at least {MIN_QUESTION_LENGTH} characters" if len(question) > MAX_QUESTION_LENGTH: return False, f"question must not exceed {MAX_QUESTION_LENGTH} characters" return True, question def format_citations(citations: list) -> list: """ Extract and format citation information from Bedrock response. Returns a clean list of sources that users can reference. """ formatted = [] seen_sources = set() # Avoid duplicate citations for citation in citations: for ref in citation.get('retrievedReferences', []): # Get source document location location = ref.get('location', {}) s3_uri = location.get('s3Location', {}).get('uri', '') # Extract just the filename from the full S3 URI # s3://bucket-name/documents/expense-policy.txt → expense-policy.txt if s3_uri and s3_uri not in seen_sources: seen_sources.add(s3_uri) filename = s3_uri.split('/')[-1] # Get a short excerpt from the retrieved content content_text = ref.get('content', {}).get('text', '') excerpt = content_text[:200].strip() if len(content_text) > 200: excerpt += '...' formatted.append({ 'document': filename, 'source_uri': s3_uri, 'excerpt': excerpt }) return formatted def query_knowledge_base(question: str) -> dict: """ Query the Bedrock Knowledge Base and return answer with citations. """ # Custom prompt template to improve answer quality # This instructs Claude on how to use the retrieved context prompt_template = """You are a helpful assistant for company employees. Answer the question using ONLY the information provided in the search results below. Important rules: - If the answer is clearly in the search results, provide it directly and concisely - If the search results do not contain enough information to answer the question, say: "I don't have specific information about that in the available documents. Please contact HR or your manager." - Never make up information not found in the search results - Keep answers professional and easy to understand - If there are specific numbers, dates, or limits mentioned, include them exactly $search_results$ Question: $query$ Answer:""" response = bedrock_agent.retrieve_and_generate( input={'text': question}, retrieveAndGenerateConfiguration={ 'type': 'KNOWLEDGE_BASE', 'knowledgeBaseConfiguration': { 'knowledgeBaseId': KNOWLEDGE_BASE_ID, 'modelArn': MODEL_ARN, 'retrievalConfiguration': { 'vectorSearchConfiguration': { 'numberOfResults': NUM_RESULTS } }, 'generationConfiguration': { 'promptTemplate': { 'textPromptTemplate': prompt_template }, 'inferenceConfig': { 'textInferenceConfig': { 'maxTokens': 800, 'temperature': 0.1 # Low temperature = factual, consistent answers } } } } } ) answer = response['output']['text'] citations = format_citations(response.get('citations', [])) return { 'answer': answer, 'citations': citations, 'session_id': response.get('sessionId', '') } def build_response(status_code: int, body: dict, request_id: str) -> dict: """Build a consistent HTTP response""" return { 'statusCode': status_code, 'headers': { 'Content-Type': 'application/json', 'X-Request-ID': request_id, 'Access-Control-Allow-Origin': '*', 'Access-Control-Allow-Headers': 'Content-Type', 'Access-Control-Allow-Methods': 'POST,OPTIONS' }, 'body': json.dumps(body) } def lambda_handler(event, context): """Main Lambda handler""" request_id = context.aws_request_id start_time = time.time() # Handle CORS preflight http_method = event.get('requestContext', {}).get('http', {}).get('method', '') if http_method == 'OPTIONS': return build_response(200, {}, request_id) log_event("INFO", "query_received", request_id=request_id) # Parse request body try: body = json.loads(event.get('body', '{}')) except json.JSONDecodeError: return build_response(400, { 'success': False, 'error': 'Request body must be valid JSON' }, request_id) question = body.get('question', '') # Validate is_valid, result = validate_question(question) if not is_valid: return build_response(400, { 'success': False, 'error': result }, request_id) question = result # cleaned question # Query the knowledge base try: rag_result = query_knowledge_base(question) duration_ms = int((time.time() - start_time) * 1000) log_event("INFO", "query_completed", request_id=request_id, question_length=len(question), answer_length=len(rag_result['answer']), citation_count=len(rag_result['citations']), duration_ms=duration_ms) return build_response(200, { 'success': True, 'answer': rag_result['answer'], 'citations': rag_result['citations'], 'metadata': { 'citation_count': len(rag_result['citations']), 'duration_ms': duration_ms, 'request_id': request_id } }, request_id) except bedrock_agent.exceptions.ThrottlingException: log_event("WARN", "throttling", request_id=request_id) return build_response(503, { 'success': False, 'error': 'Service temporarily busy. Please retry in a moment.' }, request_id) except Exception as e: log_event("ERROR", "query_failed", request_id=request_id, error_type=type(e).__name__, error_message=str(e)) return build_response(500, { 'success': False, 'error': 'Something went wrong. Please try again.' }, request_id) Part 6: Deploy the Lambda and API Gateway bash # Package and deploy Lambda zip -j rag-function.zip rag_handler.py # Create Lambda IAM role cat > lambda-rag-trust.json << 'EOF' { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Principal": {"Service": "lambda.amazonaws.com"}, "Action": "sts:AssumeRole" }] } EOF aws iam create-role \ --role-name rag-lambda-role \ --assume-role-policy-document file://lambda-rag-trust.json aws iam attach-role-policy \ --role-name rag-lambda-role \ --policy-arn arn:aws:iam::aws:policy/service-role/AWSLambdaBasicExecutionRole # Add Bedrock Knowledge Base permission cat > rag-lambda-policy.json << EOF { "Version": "2012-10-17", "Statement": [{ "Effect": "Allow", "Action": [ "bedrock:Retrieve", "bedrock:RetrieveAndGenerate", "bedrock:InvokeModel" ], "Resource": [ "arn:aws:bedrock:$REGION:$ACCOUNT_ID:knowledge-base/$KB_ID", "arn:aws:bedrock:$REGION::foundation-model/*" ] }] } EOF aws iam put-role-policy \ --role-name rag-lambda-role \ --policy-name rag-bedrock-access \ --policy-document file://rag-lambda-policy.json LAMBDA_ROLE_ARN=$(aws iam get-role \ --role-name rag-lambda-role \ --query 'Role.Arn' --output text) # Wait for role propagation sleep 10 # Create Lambda function aws lambda create-function \ --function-name rag-query-handler \ --runtime python3.12 \ --role $LAMBDA_ROLE_ARN \ --handler rag_handler.lambda_handler \ --zip-file fileb://rag-function.zip \ --timeout 30 \ --memory-size 512 \ --region $REGION \ --environment Variables="{\"KNOWLEDGE_BASE_ID\":\"$KB_ID\"}" echo "Lambda deployed" # Create API Gateway API_ID=$(aws apigatewayv2 create-api \ --name "rag-api" \ --protocol-type HTTP \ --cors-configuration \ AllowOrigins='*' \ AllowHeaders='Content-Type' \ AllowMethods='POST,OPTIONS' \ --region $REGION \ --query 'ApiId' --output text) # Create integration INTEGRATION_ID=$(aws apigatewayv2 create-integration \ --api-id $API_ID \ --integration-type AWS_PROXY \ --integration-uri arn:aws:lambda:$REGION:$ACCOUNT_ID:function:rag-query-handler \ --payload-format-version 2.0 \ --region $REGION \ --query 'IntegrationId' --output text) # Create route aws apigatewayv2 create-route \ --api-id $API_ID \ --route-key 'POST /ask' \ --target integrations/$INTEGRATION_ID \ --region $REGION # Deploy aws apigatewayv2 create-stage \ --api-id $API_ID \ --stage-name production \ --auto-deploy \ --region $REGION # Permission for API Gateway to invoke Lambda aws lambda add-permission \ --function-name rag-query-handler \ --statement-id allow-api-gateway \ --action lambda:InvokeFunction \ --principal apigateway.amazonaws.com \ --source-arn "arn:aws:execute-api:$REGION:$ACCOUNT_ID:$API_ID/*/*" \ --region $REGION API_URL=$(aws apigatewayv2 get-api \ --api-id $API_ID \ --region $REGION \ --query 'ApiEndpoint' --output text) echo "" echo "===============================" echo "RAG API is live at:" echo "$API_URL/production/ask" echo "===============================" Part 7: Test Your RAG System bash API_ENDPOINT="$API_URL/production/ask" echo "=== Test 1: Expense policy question ===" curl -s -X POST $API_ENDPOINT \ -H "Content-Type: application/json" \ -d '{"question": "What is the maximum hotel rate I can claim for a trip to London?"}' \ | python3 -m json.tool echo "" echo "=== Test 2: Annual leave question ===" curl -s -X POST $API_ENDPOINT \ -H "Content-Type: application/json" \ -d '{"question": "How many days notice do I need to give for a 2-week holiday?"}' \ | python3 -m json.tool echo "" echo "=== Test 3: Remote work question ===" curl -s -X POST $API_ENDPOINT \ -H "Content-Type: application/json" \ -d '{"question": "Do I need to be in the office on Tuesdays?"}' \ | python3 -m json.tool echo "" echo "=== Test 4: Question not in documents ===" curl -s -X POST $API_ENDPOINT \ -H "Content-Type: application/json" \ -d '{"question": "What is the capital of France?"}' \ | python3 -m json.tool Expected response for Test 1: json { "success": true, "answer": "For business trips to London, accommodation is reimbursed up to £150 per night. For other UK locations, the limit is £100 per night. All accommodation claims must be submitted through the Expenses portal within 30 days.", "citations": [ { "document": "expense-policy.txt", "source_uri": "s3://your-bucket/documents/expense-policy.txt", "excerpt": "Accommodation (up to £150 per night in London, £100 elsewhere in the UK)..." } ], "metadata": { "citation_count": 1, "duration_ms": 2341, "request_id": "abc123" } } Expected response for Test 4 (not in documents): json { "success": true, "answer": "I don't have specific information about that in the available documents. Please contact HR or your manager.", "citations": [], "metadata": { "citation_count": 0, "duration_ms": 1823, "request_id": "xyz789" } } This is the critical difference from a standard AI: when the answer is not in your documents, the system says so honestly rather than guessing. Part 8: Add More Documents and Keep It Current The RAG system only knows about documents you have ingested. Add new documents any time and re-sync: bash # Upload a new document aws s3 cp new-policy.pdf s3://$BUCKET_NAME/documents/ # Trigger a new ingestion job to index the new document aws bedrock-agent start-ingestion-job \ --knowledge-base-id $KB_ID \ --data-source-id $DATA_SOURCE_ID \ --region $REGION echo "New document ingestion started" For production, set up an automatic sync using EventBridge: bash # Create a rule that syncs every night at midnight UTC aws events put-rule \ --name rag-nightly-sync \ --schedule-expression "cron(0 0 * * ? *)" \ --state ENABLED \ --region $REGION This ensures new documents added to S3 are automatically indexed overnight without manual intervention. Part 9: Build a Simple Web Interface Your API is working. Now give it a user interface so anyone in your organisation can use it without writing code: html <!DOCTYPE html> <html lang="en"> <head> <meta charset="UTF-8"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title>Company Knowledge Base</title> <style> * { box-sizing: border-box; margin: 0; padding: 0; } body { font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', sans-serif; background: #f8fafc; min-height: 100vh; padding: 40px 20px; } .container { max-width: 720px; margin: 0 auto; } h1 { font-size: 1.75rem; color: #0f172a; margin-bottom: 6px; } .subtitle { color: #64748b; margin-bottom: 32px; } .search-box { display: flex; gap: 10px; margin-bottom: 24px; } input { flex: 1; padding: 14px 16px; border: 1px solid #e2e8f0; border-radius: 8px; font-size: 1rem; outline: none; transition: border-color 0.2s; } input:focus { border-color: #3b82f6; box-shadow: 0 0 0 3px rgba(59,130,246,0.1); } button { padding: 14px 24px; background: #3b82f6; color: white; border: none; border-radius: 8px; font-size: 1rem; font-weight: 600; cursor: pointer; white-space: nowrap; } button:hover { background: #2563eb; } button:disabled { background: #94a3b8; cursor: not-allowed; } .answer-card { background: white; border: 1px solid #e2e8f0; border-radius: 10px; overflow: hidden; display: none; } .answer-card.visible { display: block; } .answer-header { padding: 14px 20px; background: #f1f5f9; border-bottom: 1px solid #e2e8f0; font-size: 0.85rem; font-weight: 600; color: #475569; text-transform: uppercase; letter-spacing: 0.5px; } .answer-body { padding: 20px; line-height: 1.7; color: #334155; } .citations { padding: 16px 20px; border-top: 1px solid #e2e8f0; background: #f8fafc; } .citations h3 { font-size: 0.8rem; color: #64748b; text-transform: uppercase; letter-spacing: 0.5px; margin-bottom: 10px; } .citation-item { padding: 10px 12px; background: white; border: 1px solid #e2e8f0; border-radius: 6px; margin-bottom: 8px; font-size: 0.85rem; } .citation-doc { font-weight: 600; color: #3b82f6; margin-bottom: 4px; } .citation-excerpt { color: #64748b; font-style: italic; } .loading { color: #3b82f6; padding: 20px; text-align: center; display: none; } .error { padding: 16px 20px; background: #fef2f2; border: 1px solid #fecaca; border-radius: 8px; color: #dc2626; display: none; } .no-citations { color: #64748b; font-size: 0.85rem; font-style: italic; } </style> </head> <body> <div class="container"> <h1> Company Knowledge Base</h1> <p class="subtitle">Ask any question about company policies, procedures, and guidelines.</p> <div class="search-box"> <input type="text" id="questionInput" placeholder="e.g. How many days notice do I need for annual leave?" onkeypress="if(event.key==='Enter') askQuestion()" /> <button onclick="askQuestion()" id="askBtn">Ask</button> </div> <div class="error" id="errorDiv"></div> <div class="loading" id="loadingDiv"> Searching company documents...</div> <div class="answer-card" id="answerCard"> <div class="answer-header">Answer</div> <div class="answer-body" id="answerBody"></div> <div class="citations" id="citationsDiv"> <h3> Sources</h3> <div id="citationsList"></div> </div> </div> </div> <script> // Replace with your actual API URL const API_URL = 'YOUR_API_GATEWAY_URL/production/ask'; async function askQuestion() { const question = document.getElementById('questionInput').value.trim(); if (!question) return; const btn = document.getElementById('askBtn'); const loading = document.getElementById('loadingDiv'); const error = document.getElementById('errorDiv'); const card = document.getElementById('answerCard'); btn.disabled = true; loading.style.display = 'block'; error.style.display = 'none'; card.classList.remove('visible'); try { const res = await fetch(API_URL, { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ question }) }); const data = await res.json(); if (!res.ok || !data.success) throw new Error(data.error || 'Request failed'); document.getElementById('answerBody').textContent = data.answer; const citationsList = document.getElementById('citationsList'); if (data.citations && data.citations.length > 0) { citationsList.innerHTML = data.citations.map(c => ` <div class="citation-item"> <div class="citation-doc">📄 ${c.document}</div> <div class="citation-excerpt">"${c.excerpt}"</div> </div> `).join(''); } else { citationsList.innerHTML = '<p class="no-citations">No specific sources cited for this answer.</p>'; } card.classList.add('visible'); } catch (err) { error.textContent = `Error: ${err.message}`; error.style.display = 'block'; } finally { btn.disabled = false; loading.style.display = 'none'; } } </script> </body> </html> Save this as index.html , replace YOUR_API_GATEWAY_URL , and host it on S3 static website hosting (covered in Article 4 of this series). What You Have Built , And What It Means for Your Organisation Let us step back and look at what this system does: Before RAG: Employee has a question → 45 minutes searching documents → finds the answer (if lucky) After RAG: Employee has a question → types it → gets a cited answer in 3 seconds For a 100-person organisation where employees each save 30 minutes per day searching for information, that is 50 person-hours saved per day, roughly 3 full-time employees' worth of time redirected from searching to actual work. The citations are not just nice to have. They are essential for enterprise trust. Your employees can see exactly which document the answer came from and verify it themselves. The AI is not guessing, it is reading your actual documents and reporting what they say. Common Issues and How to Fix Them "The knowledge base is not finding relevant documents" The chunking strategy might not be optimal for your document types. In the AWS console → Knowledge Base → Data Source → Edit → change the chunking strategy to "Semantic chunking" for better results with long documents. "The AI is making up answers not in the documents" Increase the strictness of the prompt template. Add: "If you are not 100% certain the answer is in the provided context, say you do not know." "Ingestion is taking a long time" Large PDF files with many images can be slow to process. For best performance, use text-based PDFs or convert Word documents to plain text before uploading. "I get throttling errors" Amazon Bedrock has per-account quotas. For production scale, request a quota increase in the AWS Service Quotas console. \

View original source — Hacker Noon ↗

ShareShare on X Share on Facebook