How Amazon Finance streamlines regulatory inquiries by using generative AI on AWS

Amazon’s Finance Technology (FinTech) teams build and operate systems for Amazon teams to manage regulatory inquiries in compliance with different jurisdictions. These teams process regulatory inquiries from authorities, each presenting different requirements, document formats, and complexity levels.

Processing these regulatory inquiries involves reviewing documentation, extracting relevant information, retrieving supporting data from multiple systems within Amazon’s infrastructure, and compiling responses within regulatory timeframes. As inquiry frequency and business complexity grew, Amazon needed a more scalable approach.

In this post, we demonstrate how Amazon FinTech teams are using Amazon Bedrock and other AWS services to build a scalable AI application to transform how regulatory inquiries are handled. Each team using this solution creates and maintains its own dedicated knowledge base, populated with that team’s specific documents and reference materials.

Challenges

The scale and complexity of managing regulatory inquiries presented several interconnected challenges:

Knowledge fragmentation and retrieval complexity

Regulatory inquiries require synthesizing information from thousands of historical documents. These documents exist in various formats (PDF, PPT, Word, CSV) and contain domain-specific terminology. Teams needed a way to quickly locate relevant precedents and supporting information across this vast corpus while maintaining accuracy and regulatory compliance.

Conversational context and state management

Regulatory inquiries require multi-turn conversations where context from earlier interactions is essential for generating accurate responses. Maintaining conversational state across sessions and tracking response evolution as team members refine answers through iterative interactions presents significant complexity.

Observability and continuous improvement

With generative AI systems, understanding why a particular response was generated is as important as the response itself. Teams required comprehensive visibility into the retrieval process, model decisions, and user interactions to identify areas for improvement and maintain compliance with responsible AI principles. For example, teams must detect when the model hallucinates information that isn’t present in source documents, or catch when the system retrieves outdated compliance guidelines that could lead to regulatory violations. AI systems experience accuracy drift over time as models, prompts, and the document corpus change, requiring continuous monitoring.

Solution overview

To address these challenges, Amazon FinTech team built an intelligent regulatory response automation system using Amazon Bedrock, AWS Lambda, and supporting AWS services. The solution implements Retrieval Augmented Generation (RAG) with Amazon Bedrock Knowledge Bases and Amazon OpenSearch Serverless for vector storage, enabling information retrieval from thousands of historical documents. Real-time chat interactions powered by Claude Sonnet 4.5 through the Converse Stream API, combined with Amazon DynamoDB for conversation history management, provide contextually-aware multi-turn conversations. Comprehensive observability through OpenTelemetry and self-hosted Langfuse ensures continuous monitoring and improvement of the AI system’s performance. The system doesn’t cache large language model (LLM) responses or intermediate results because regulatory inquiries are highly contextual and are prone to a low cache hit rate.

The following diagram shows how you can use Amazon Bedrock Knowledge Bases in a workflow, alongside Converse API and other tools, to provide necessary information for regulatory inquiries:

Knowledge base ingestion flow

The knowledge base ingestion flow provides an automated document processing pipeline that initiates after the user uploads a document. Its job is to embed the document’s data into an Amazon Bedrock Knowledge Base. Here is the flow:

You can use the knowledge base ingestion workflow to upload documents in bulk and transform them into searchable vector embeddings through an automated pipeline. The following detailed flow is illustrated in the previous figure.

Document Upload by User: Users upload documents through the client application.
Pre-Signed URL Generation: The client application sends a request to Amazon API Gateway, which invokes the knowledge base ingestion AWS Lambda function to generate a pre-signed S3 URL.
Document Upload: The client application uses the generated pre-signed URL to upload the document.
Ingestion Trigger and Data Processing: After the document is successfully uploaded to Amazon Simple Storage Service (Amazon S3), the client application triggers the Amazon API Gateway to initiate the document processing AWS Lambda, which handles format conversion and manages the concurrent ingestion of documents. We don’t need to pre-process the images, charts, and tables in these documents because the Amazon Bedrock Knowledge Base is configured with Amazon Bedrock Data Automation (BDA) to effectively extract this multimodal content. The AWS Lambda function then calls the Amazon Bedrock Knowledge Bases.
Vector Storage: The Amazon Bedrock Knowledge Base chunks the document content using a hierarchical chunking strategy, generates embeddings using Amazon Titan Text Embeddings, and stores the resulting vectors in OpenSearch Serverless. Hierarchical chunking creates nested parent-child relationships that mirror the sectioned structure of financial documents. This strategy works well for structured and complex documents because it indexes small chunks for precise retrieval while returning larger parent chunks to provide sufficient context for coherent responses.

Building an automated ingestion pipeline addresses the core challenge of knowledge fragmentation by efficiently processing thousands of historical documents across multiple formats while optimizing content indexing for relevant AI responses. This parallelized approach enables the system to scale effectively, accommodating the growing year-over-year regulatory inquiry activity while maintaining consistent processing performance across large document volumes.

Chat Application

The Chat Application provides a real-time conversation interface powered by AWS serverless architecture, enabling natural language interactions with the system. We chose to stream responses to customers so they can begin reading the AI response sooner in real-time, implementing this capability through WebSocket connections. Through these WebSocket connections and the Claude Sonnet 4.5 model, it delivers contextually relevant responses while maintaining conversation state in DynamoDB. The workflow operates as follows:

Initiate Chat Conversation: Users initiate or open an existing chat session through the client application.
WebSocket Connection: The application uses WebSockets to establish a persistent, bi-directional connection with Amazon API Gateway.
Message Submission: The application posts the user questions through the WebSocket connection which is propagated to the Chat service AWS Lambda function.
Query Enhancement: The Chat Service AWS Lambda function uses the Claude 3.5 Haiku model with a query expansion strategy to generate multiple variations of the user’s question.
Knowledge Retrieval: The Chat Service Lambda invokes the Amazon Bedrock Knowledge Bases Retrieve API for each expanded query. The API performs vector similarity searches against the underlying OpenSearch Serverless index and returns the most relevant document chunks along with their source metadata and relevance scores.
Context Assembly: The Chat Service AWS Lambda function retrieves conversation history from Amazon DynamoDB (for existing conversations, based on that specific conversation ID) and combines it with the retrieved knowledge base results and the user’s question.
Response Generation: The Chat Service AWS Lambda function uses the Converse Stream API with Claude Sonnet 4.5 and a response generator prompt to produce a contextually relevant answer based on the assembled context.
User Engagement: The Chat Service AWS Lambda function streams the generated response back to the client application in Markdown format through the WebSocket connection and stores all the conversation in the Conversational History Table by Amazon DynamoDb.
Observability: Throughout the process, the Chat Service publishes end-to-end traces to a self-hosted Langfuse instance using the OpenTelemetry (OTEL) SDK. This captures detailed telemetry data including latency metrics, token usage, prompt templates, and model responses.

Multi-turn conversational experience

Regulatory inquiry discussions often progress through multiple exchanges as teams refine responses and reference additional data sources. To support this iterative process, the Amazon FinTech team implemented a multi-turn conversational workflow using Amazon API Gateway (WebSocket APIs), AWS Lambda, and Amazon DynamoDB, integrated with the Amazon Bedrock ConverseStream API for low-latency, context-aware dialogue. Each chat session is securely authenticated through Amazon Cognito and assigned a unique conversation ID. DynamoDB stores messages in chronological order to preserve context across sessions, so users can resume prior discussions seamlessly and maintain continuity.

When a user submits a query, the system sanitizes inputs to prevent prompt injection attacks. After sanitization, the system classifies intent and determines whether retrieval from the Amazon Bedrock Knowledge Base is required. This determination is made through an LLM call that classifies the user query as either conversational or knowledge intensive. For complex, knowledge-intensive questions, the workflow employs a query expansion strategy that addresses the prevalent use of acronyms and abbreviated questions by users. This layer generates up to five query variations using Claude 3.5 Haiku, then makes parallel Retrieve API calls to the Knowledge Base, retrieving relevant results using OpenSearch vector similarity search. To maintain performance at scale, the workflow implements parallel processing for these retrieval calls using multi-threading. This optimization reduced retrieval latency from 10 seconds (sequential processing) to under 2 seconds, enabling responsive conversations. The retrieved information—combined with recent conversation history—is passed to Claude Sonnet 4.5 through the ConverseStream API augmented with Amazon Bedrock Guardrails, that implement sensitive information filters to automatically detect and remove PII and financial data from both inputs and outputs. This is critical for protecting regulatory documentation. When prompt injection attempts are detected, the system responds with “Sorry, the model cannot answer that question,” maintaning secure and compliant interactions while maintaining conversational fluency.

This architecture delivers continuity, transparency, and scalability. Users receive real-time, streaming responses with status updates throughout the retrieval and generation phases, improving engagement and reducing latency. Persistent logs in DynamoDB provide an immutable audit trail for compliance review, while the serverless and event-driven design scales automatically to support concurrent sessions. Together, these capabilities enable Amazon FinTech team to conduct complex, iterative conversations—producing contextually relevant, secure, and regulatory-compliant responses powered by Amazon Bedrock.

Observability

Observability plays a critical role in understanding and improving AI-driven workflows. To achieve complete visibility into the regulatory inquiry response system, the Chat Service AWS Lambda integrated OpenTelemetry (OTEL) with a self-hosted Langfuse instance to capture detailed, end-to-end traces of each interaction. This setup provides engineers and applied scientists with fine-grained telemetry on how prompts are processed, knowledge is retrieved, and responses are generated. This enables nearly continuous refinement of the system’s performance and accuracy. The decision to use OTEL over the native Langfuse SDK provides vendor-neutral flexibility, allowing telemetry data to be routed to multiple observability backends and adapted to evolving monitoring requirements.

At runtime, each stage of the Chat Service AWS Lambda is manually instrumented using the OTEL Java SDK to record latency, token usage, model decisions, and prompt metadata in OTEL Generative AI semantic standard. Spans are published to Langfuse in near real time, giving the team a transparent view of how the Amazon Bedrock ConverseStream API, Knowledge Base retrieval, and Claude Sonnet 4.5 interact within a single request. The detailed telemetry allows the team to identify performance bottlenecks, optimize prompt strategies, and enhance retrieval precision while maintaining responsible AI practices.

This observability framework maintains trust and accountability in the system’s behavior. Engineers can correlate user actions with model outcomes, trace data lineage across multiple services, and fine-tune configurations without disrupting operations. By combining OpenTelemetry’s interoperability with Langfuse’s visualization and analytics, Amazon FinTech team gains a scalable, extensible foundation for evaluating generative AI systems at scale—turning every interaction into actionable insight for continuous improvement.

The following screenshot illustrates an end-to-end trace captured in Langfuse, showcasing how the observability solution captures the complete workflow—from query expansion and knowledge retrieval to model prompts, responses, and latency metrics. It also highlights source document citations, offering a transparent view of how contextual information flows through the system during response generation

Reference: End-to-End Trace Posted in Langfuse

Conclusion

In this post, you saw how Amazon FinTech team built a scalable AI solution using Amazon Bedrock, designed to support regulatory inquiries by automating knowledge retrieval, conversational workflows, and response generation. By combining a document ingestion pipeline, multi-turn stateful conversations, and detailed observability via OpenTelemetry and Langfuse, the architecture empowers teams to handle regulatory inquiries in governed, traceable and compliant manner.

Because the entire stack is built on AWS serverless services, it offers the operational scalability, security, and elasticity required for enterprise-grade deployment. Whether you’re dealing with legal compliance, regulatory inquiries, or high-volume internal knowledge workflows, this pattern offers a practical foundation that you can tailor and extend to your business domain.

If you’re ready to modernize your knowledge-intensive processes with generative AI, explore the Amazon Bedrock documentation to discover how you can begin building your own secure, governed, and scalable AI-powered workflows.