Organizations handle vast amounts of sensitive customer information through various communication channels. Protecting Personally Identifiable Information (PII), such as social security numbers (SSNs), driver’s license numbers, and phone numbers has become increasingly critical for maintaining compliance with data privacy regulations and building customer trust. However, manually reviewing and redacting PII is time-consuming, error-prone, and scales poorly as data volumes grow.
Organizations face challenges when dealing with PII scattered across different content types – from texts to images. Traditional approaches often require separate tools and workflows for handling text and image content, leading to inconsistent redaction practices and potential security gaps. This fragmented approach not only increases operational overhead but also raises the risk of accidental PII exposure.
This post shows an automated PII detection and redaction solution using Amazon Bedrock Data Automation and Amazon Bedrock Guardrails through a use case of processing text and image content in high volumes of incoming emails and attachments. The solution features a complete email processing workflow with a React-based user interface for authorized personnel to more securely manage and review redacted email communications and attachments. We walk through the step-by-step solution implementation procedures used to deploy this solution. Finally, we discuss the solution benefits, including operational efficiency, scalability, security and compliance, and adaptability.
Solution overview
The solution provides an automated system for protecting sensitive information in business communications through three main capabilities:
- Automated PII detection and redaction for both email content and attachments using Amazon Bedrock Data Automation and Guardrails, making sure that sensitive data is consistently protected across different content types.
- More secure data management workflows where processed communications are encrypted and stored with appropriate access controls, while maintaining a complete audit trail of operations.
- Web-based interface options for authorized agents to efficiently manage redacted communications, supported by features like automated email categorization and customizable folder management.
This unified approach helps organizations maintain compliance with data privacy requirements while streamlining their communication workflows.
The following diagram outlines the solution architecture. 
The diagram illustrates the backend PII detection and redaction workflow and the frontend application user interface orchestrated by AWS Lambda and Amazon EventBridge. The process follows these steps:
- The workflow starts with the user sending an email to the incoming email server hosted on Amazon Simple Email Service (Amazon SES). This is an optional step.
- Alternatively, users can upload the emails and attachments directly into an Amazon Simple Storage Service (S3) landing bucket.
- An S3 event notification triggers the initial processing AWS Lambda function that generates a unique case ID and creates a tracking record in Amazon DynamoDB.
- Lambda orchestrates the PII detection and redaction workflow by extracting email body and attachments from the email and saving it in a raw email bucket followed by invoking Amazon Bedrock Data Automation and Guardrails for detecting and redacting PII.
- Amazon Bedrock Data Automation processes attachments to extract text from the files.
- Amazon Bedrock Guardrails detects and redacts the PII from both email body and text from attachments and stores the redacted content in another S3 bucket.
- DynamoDB tables are updated with email messages, folders metadata, and email filtering rules.
- An Amazon EventBridge Scheduler is used to run the Rules Engine Lambda on a schedule which processes new emails that have yet to be categorized into folders based on enabled email filtering rules criteria.
- The Rules Engine Lambda also communicates with DynamoDB to access the messages table and the rules table.
- Users can access the optional application user interface through Amazon API Gateway, which manages user API requests and routes requests to render the user interface through S3 static hosting. Users may choose to enable authentication for the user interface based on their security requirements. Alternatively, users can check the status of their email processing in the DynamoDB table and S3 bucket with PII redacted content.
- A Portal API Lambda fetches the case details based on user requests.
- The static assets served by API Gateway are stored in a private S3 bucket.
- Optionally, users may enable Amazon CloudWatch and AWS CloudTrail to provide visibility into the PII detection and redaction process, while using Amazon Simple Notification Service to deliver real-time alerts for any failures, facilitating immediate attention to issues.
In the following sections, we walk through the procedures for implementing this solution.
Walkthrough
The solution implementation involves infrastructure and optional portal setup.
Prerequisites
Before beginning the implementation, make sure to have the following components installed and configured.
- An AWS account
- Git
- Python 3.7 or higher
- Node v18 or higher
- NPM v9.8 or higher
- AWS CDK v2.166 or higher
- Terminal/CLI such as macOS Terminal, PowerShell or Windows Terminal, or the Linux command line. AWS CloudShell can also be used when all code is located within an AWS account
Infrastructure setup and deployment process
Verify that an existing virtual private cloud VPC that contains three private subnets with no internet access is created in your AWS account. All AWS CloudFormation stacks need to be deployed within the same AWS account.
CloudFormation stacks
The solution contains three stacks (two required, one optional) that deploys in your AWS account:
- S3Stack – Provisions the core infrastructure including S3 buckets for raw and redacted email storage with automatic lifecycle policies, a DynamoDB table for email metadata tracking with time-to-live (TTL) and global secondary indexes, and VPC security groups for more secure Lambda function access. It also creates Amazon Identity and Access Management (IAM) roles with comprehensive permissions for S3, DynamoDB, and Bedrock services, forming a more secure foundation for the entire PII detection and redaction workflow.
- ConsumerStack – Provisions the core processing infrastructure including Amazon Bedrock Data Automation projects for document text extraction and Bedrock Guardrails configured to anonymize comprehensive PII entities, along with Lambda functions for email and attachment processing with Amazon Simple Notification Service (SNS) topics for success/failure notifications. It also creates Amazon Simple Email Service (SES) receipt rules for incoming email handling when a domain is configured and S3 event notifications to trigger the email processing workflow automatically.
- PortalStack (optional) – This is only needed when users want to use a web-based user interface for managing emails. It provisions the optional web interface including a regional API Gateway, DynamoDB tables for redacted message storage, and S3 buckets for static web assets.
Amazon SES (optional)
Move directly to the Solution Deployment section that follows if Amazon SES is not being used.
The following Amazon SES Setup is optional. The code may be tested without this setup as well. Steps to test the application with or without Amazon SES is covered in the Testing section.
Set up Amazon SES with prod access and verify the domain/email identities for which the solution is to work. We also need to add the MX records in the DNS provider maintaining the domain. Please refer to the following links:
Create credentials for SMTP and save it in AWS Secrets Manager secret with name SmtpCredentials. An IAM user is created for this process.
If any other name is being used for the secret, update the context.json line secret_name with the name of the secret created.
The key for the username in the secret should be smtp_username and the key for password should be smtp_password when storing the same in AWS Secrets Manager.
Solution deployment
Run the following commands from within a terminal/CLI environment.
- Clone the repository
- The
infra/cdk.jsonfile tells the CDK Toolkit how to execute your app - Optional: Create and activate a new Python virtual environment (make sure to use python 3.12 as lambda is in CDK is configured for same. If using some other python version update CDK code to reflect the same in lambda runtime)
- Upgrade pip
- Install Python packages
- Create
context.jsonfile - Update the
context.jsonfile with the correct configuration options for the environment.
| Property Name | Default | Description | When to Create |
|---|---|---|---|
vpc_id |
“” | VPC ID where resources are deployed | VPC needs to be created prior to execution |
raw_bucket |
“” | S3 bucket storing raw messages and attachments | Created during CDK deployment |
redacted_bucket_name |
“” | S3 bucket storing redacted messages and attachments | Created during CDK deployment |
inventory_table_name |
“” | DynamoDB table name storing redacted message details | Created during CDK deployment |
resource_name_prefix |
“” | Prefix used when naming resources during the stack creation | During stack creation |
retention |
90 | Number of days for retention of the messages in the redacted and raw S3 buckets | During stack creation |
- The following properties are only required when the portal is being provisioned.
| Property Name | Default | Description |
|---|---|---|
environment |
development | The type of environment where resources are provisioned. Values are development or production |
- Use cases that require the usage of Amazon SES to manage redacted email messages need to set the following configuration variables. Otherwise, these are optional.
| Property Name | Description | Comment |
|---|---|---|
domain |
The verified domain or email name that is used for Amazon SES | This can be left blank if not setting up Amazon SES |
auto_reply_from_email |
Email address of the “from” field of the email message. Also used as the email address where emails are forwarded from the Portal application | This can be left blank if not setting up the Portal |
secret_name |
AWS Secrets Manager secret containing SMTP credentials for forward email functionality from the portal |
- Deploy Infrastructure by running the following commands from the root of the infra directory.
- Bootstrap the AWS account to use AWS CDK
- Users can now synthesize the CloudFormation template for this code. Additional environment variables before the cdk synth suppresses the warnings. The deployment process should take approximately 10 min for a first-time deployment to complete.
- Replace
<<resource_name_prefix>>with its chosen value and then run:
Testing
- Testing the application with Amazon SES
Before starting the test, make sure the Amazon SES Email Receiving rule set that was created by the <<resource_name_prefix>>-ConsumerStack stack is active. We can check by executing the below command and make sure name in the output is <<resource_name_prefix>>-rule-setaws ses describe-active-receipt-rule-set. If the name does not match or the output is blank, execute the following to activate the same:
Once we have the correct rule set active, we can test the application using Amazon SES by sending an email to the verified email or domain in Amazon SES, which automatically triggers the redaction pipeline. Progress can be tracked in the DynamoDB table <<inventory_table_name>>. The inventory table name can be found on the resources tab in the AWS CloudFormation Console for the <<resource_name_prefix>>-S3Stack stack and Logical ID EmailInventoryTable. A unique <<case_id>> is generated and used in the DynamoDB inventory table for each email being processed. Once redaction is complete, the redacted email body can be found in <<redacted_bucket_name>>/redacted/<<today_date>>/<<case_id>>/email_body/ and redacted attachments in <<redacted_bucket_name>>/redacted/<<today_date>>/<<case_id>>/attachments/.
- Testing the application without Amazon SES
As described earlier, the solution is used to redact any PII data in the email body and attachments. Therefore, to test the application, we need to provide an email file which needs to be redacted. We can do that without Amazon SES by directly uploading an email file to the raw S3 bucket. The raw bucket name can be found on the output tab in the AWS CloudFormation Console for <<resource_name_prefix>>-S3Stack stack and Export Name RawBucket. This triggers the workflow of redacting the email body and attachments by S3 event notification triggering the Lambda. For your convenience, a sample email is available in the infra/pii_redaction/sample_email directory of the repository. Below are the steps to test the application without Amazon SES using the same email file.
The above triggers the redaction of the email process. You can track the progress in the DynamoDB table <<inventory_table_name>>. A unique <<case_id>> is generated and used in the DynamoDB inventory table for each email being processed. The inventory table name can be found on the resources tab in the AWS CloudFormation Console for <<resource_name_prefix>>-S3Stack stack and Logical ID EmailInventoryTable. Once redaction is complete, the redacted email body can be found in <<redacted_bucket_name>>/redacted/<<today_date>>/<<case_id>>/email_body/ and redacted attachments in <<redacted_bucket_name>>/redacted/<<today_date>>/<<case_id>>/attachments/.
Portal setup
The installation of the portal is completely optional. This section can be skipped; check the console of the AWS account where the solution is deployed to view the resources created. The portal serves as a web interface to manage the PII-redacted emails processed by the backend AWS infrastructure, allowing users to view sanitized email content. The Portal can be used to:
- List messages: View processed emails with redacted content
- Message details: View individual email content and attachments
Portal Prerequisites: This portal requires the installation of the following software tools:
- TypeScript
- Node v18 or higher
- NPM v9.8 or higher
Infrastructure Deployment
- Synthesize the CloudFormation template for this code by going to the directory root of the solution. Now run the following command:
- Optional: Create and activate a new Python virtual environment (if the virtual environment has not been created previously):
- Users can now synthesize the CloudFormation template for this code.
- Deploy the React-based portal. Replace
<<resource_name_prefix>>with its chosen value:
The first-time deployment should take approximately 10 minutes to complete.
Environment Variables
- Create a new environment file by going to the root of the app directory and update the following variables in the
.envfile (by copying the.env.examplefile to.env) using the following command to create the.envfile using a terminal/CLI environment. - The file can be created using your preferred text editor as well.
| Environment Variable Name | Default | Description | Required |
|---|---|---|---|
VITE_APIGW |
“” | URL of the API Gateway invokes URL (including protocol) without the path (remove /portal from the value). This value can be found in the output of the PortalStack after deploying through AWS CDK. It can also be found under the Outputs tab of the PortalStack CloudFormation stack under the export name of PiiPortalApiGatewayInvokeUrl |
Yes |
VITE_BASE |
/portal | It specifies the path used to request the static files needed to render the portal | Yes |
VITE_API_PATH |
/api | It specifies the path needed to send requests to the API Gateway | Yes |
Portal deployment
Run the following commands from within a terminal/CLI environment.
- Before running any of the following commands, go to the root of the app directory to build this application for production by running the following commands:
- Install NPM packages
- Build the files
- After the build succeeds, transfer all of the files within the
dist/directory into the Amazon S3 bucket that is designated for these assets (specified in the PortalStack provisioned via CDK).- Example:
aws s3 sync dist/ s3://<<name-of-s3-bucket>> --delete<<name-of-s3-bucket>>is the S3 bucket that has been created in the<<resource-name-prefix>>-PortalStackCloudFormation stack with the Logical ID ofPrivateWebHostingAssets. This value can be obtained from the Resources tab of the CloudFormation stack in the AWS Console. This value is also output during thecdk deployprocess when the PortalStack has been successfully completed.
- Example:
Accessing the portal
Use the API Gateway invoke URL from the API Gateway that has been created during the cdk deploy process to access the portal from a web browser. This URL can be found by following these steps:
- Visit the AWS Console
- Go to API Gateway and find the API Gateway that has been created during the
cdk deployprocess. The name of the API Gateway can be found in the Resources section of the<<resource-name-prefix>>-PortalStackCloudFormation stack. - Click on the Stages link in the left-hand menu.
- Make sure that the portal stage is selected
- Find the Invoke URL and copy that value
- Enter that value in the address bar of your web browser
The portal’s user interface is now visible within the web browser. If any emails have been processed, they are listed on the home page of the portal.
Access control (optional)
For production deployment, we recommend these approaches to controlling and managing access to the Portal.
Clean up
To avoid incurring future charges, follow these steps to remove the resources created by this solution:
- Delete the contents of the S3 buckets created by the solution:
- Raw email bucket
- Redacted email bucket
- Portal static assets bucket (if portal was deployed)
- Delete or disable the Amazon SES rule step created by the solution using below cli command:
- Remove the CloudFormation stacks in the following order:
- CDK Destroy does not remove the access log Amazon S3 bucket created as part of the deployment. Users can get access to the log bucket name in the output tab of stack
<<resource_name_prefix>>-S3Stackwith export name AccessLogsBucket. Execute the below steps to delete the access log bucket:- To delete the contents of the access log bucket, follow the instructions on deleting S3 bucket
- Access to the log bucket is version-enabled and deleting the content of the bucket in the above step does not delete versioned objects in the bucket. That needs to be removed separately using below aws cli commands:
- Delete the access log Amazon S3 bucket using below aws cli command:
- If Amazon SES is configured:
- Remove the verified domain/email identities
- Delete the MX records from your DNS provider
- Delete the SMTP credentials from AWS Secrets Manager
- Delete any CloudWatch Log groups created by the Lambda functions
The VPC and its associated resources as prerequisites for this solution may not be deleted if they may be used by other applications.
Conclusion
In this post, we demonstrated how to automate the detection and redaction of PII across both text and image content using Amazon Bedrock Data Automation and Amazon Bedrock Guardrails. By centralizing and streamlining the redaction process, organizations can strengthen alignment with data privacy requirements, enhance security practices, and minimize operational overhead.
However, it is equally important to make sure that your solution is built with Amazon Bedrock Data Automation’s document processing constraints in mind. Amazon Bedrock Data Automation supports PDF, JPEG, and PNG file formats with a maximum console-processing size of 200 MB (500 MB via API), and single documents may not exceed 20 pages unless document splitting is enabled.
By using Amazon Bedrock Data Automation and Amazon Bedrock Guardrails centralized redaction capabilities, organizations can boost data privacy compliance management, cut operational overhead, and maintain stringent security across diverse workloads. This solution’s extensibility further enables integration with other AWS services, fine-tuning detection logic for more advanced PII patterns, and broadening support for additional file types or languages in the future, thereby evolving into a more robust, enterprise-scale data protection framework.
We encourage exploration of the provided GitHub repository to deploy this solution within your organization. In addition to delivering operational efficiency, scalability, security, and adaptability, the solution also provides a unified interface and robust audit trail that simplifies data governance. By refining detection rules, users can integrate additional file formats where possible and use Amazon Bedrock Data Automation and Amazon Bedrock Guardrails modular framework.
We invite you to implement this PII detection and redaction solution in the following GitHub repo to build a more secure, compliance-aligned, and highly adaptable data protection solution on Amazon Bedrock that addresses evolving business and regulatory requirements.
About the Authors
Himanshu Dixit is a Delivery Consultant at AWS Professional Services specializing in databases and analytics, bringing over 18 years of experience in technology. He is passionate for artificial intelligence, machine learning, and generative AI, leveraging these cutting-edge technologies to create innovative solutions that address real-world challenges faced by customers. Outside of work, he enjoys playing badminton, tennis, cricket, table tennis and spending time with her two daughters.
David Zhang is an Engagement Manager at AWS Professional Services, where he leads enterprise-scale AI/ML, cloud transformation initiatives for Fortune 100 customers in telecom, finance, media, and entertainment. Outside of work, he enjoys experimenting with new recipes in his kitchen, playing tenor saxophone, and capturing life’s moments through his camera.
Richard Session is a Lead User Interface Developer for AWS ProServe, bringing over 15 years of experience as a full-stack developer across marketing/advertising, enterprise technology, automotive, and ecommerce industries. With a passion for creating intuitive and engaging user experiences, he uses his extensive background to craft exceptional interfaces for AWS’s enterprise customers. When he’s not designing innovative user experiences, Richard can be found pursuing his love for coffee, spinning tracks as a DJ, or exploring new destinations around the globe.
Viyoma Sachdeva is a Principal Industry Specialist in AWS. She is specialized in AWS DevOps, containerization and IoT helping Customer’s accelerate their journey to AWS Cloud.
