AWS-Powered Next-Gen KYC: How to Build A Secure and Scalable Image Processing Pipeline with Serverless Architecture
In the digital age, verifying the identity of users is crucial for ensuring security, compliance, and trust. Know Your Customer (KYC) processes have become a vital part of various industries, including finance, healthcare, and e-commerce. These processes help organizations verify the identity of their clients, comply with regulatory requirements, and prevent fraud. However, traditional KYC methods can be cumbersome, time-consuming, and costly.
To address these challenges, I present to you a cutting-edge solution powered by AWS (Amazon Web Services). This guide outlines how to build a next-generation KYC system using AWS's serverless architecture, which provides a secure, scalable, and efficient image processing pipeline. By leveraging the power of AWS, we can automate and streamline the KYC process, reducing operational overhead while enhancing security and reliability.
Our AWS-powered KYC system employs a serverless architecture that integrates several AWS services to handle image uploads, processing, validation, and notifications seamlessly. Below is an overview of the key components and their roles within the architecture:
upload_image
), processing images (process_image
), and validating results (validate_image
). Lambda functions scale automatically with the volume of requests, ensuring efficient processing. We can also easily manage concurrency and throttling in lambda functions.upload_image
Lambda function.upload_image
Lambda function stores the images in an S3 bucket. S3 events trigger the process_image
Lambda function for processing.process_image
Lambda function utilizes AWS Rekognition to analyze the images. It extracts necessary information, compares faces, and validates the data against predefined criteria. It uploads this processed image back to s3 while deleting the original image.process_image
Lambda function publishes a message to an SNS topic.validate_image
Lambda function, subscribed to the SNS topic, retrieves the message and checks the results stored in DynamoDB. Based on the validation outcome, it sends a success or failure notification to respective SNS topics.Remember that this is a proof-of-concept. You can find production-ready architectures here.
Before we begin, please note that I didn't discuss IAM roles and policies. I expect that you should be able to add the required IAM roles where necessary, without guidance.
On the S3 Console, click on "Create Bucket"
Remember that bucket name must be globally unique. So you must choose a different, unique name.
Also upload a refrence image to your bucket reference/face.jpg
.
We'll come back to update the bucket's resource based polices and to set up event notifications.
The upload_image
Lambda function is responsible for handling the image upload process initiated by the API Gateway. Its main tasks include receiving the image from the API Gateway request, storing the image in an S3 bucket, and triggering the next step in the processing pipeline.
import json
import base64
import boto3
s3_client = boto3.client('s3')
def lambda_handler(event, context):
# Extract image data from the API Gateway event
try:
body = json.loads(event['body'])
image_data = body['image_data']
user_id = body['user_id'] # Assuming user ID is passed along with the image
except (json.JSONDecodeError, KeyError) as e:
return {
'statusCode': 400,
'body': json.dumps('Invalid input')
}
# Decode the base64-encoded image data
try:
image_bytes = base64.b64decode(image_data)
except base64.binascii.Error as e:
return {
'statusCode': 400,
'body': json.dumps('Invalid base64 image data')
}
# Generate a unique key for the image in S3
image_key = f'kyc/{user_id}/{context.aws_request_id}.jpg'
# Upload the image to the S3 bucket
try:
s3_client.put_object(
Bucket='your-s3-bucket-name',
Key=image_key,
Body=image_bytes,
ContentType='image/jpeg'
)
except Exception as e:
print(e)
return {
'statusCode': 500,
'body': json.dumps('Failed to upload image')
}
# Return success response
return {
'statusCode': 200,
'body': json.dumps({'message': 'Image uploaded successfully', 'image_key': image_key})
}
The process_image
Lambda function is a crucial component in the AWS-powered KYC architecture. Its primary responsibilities include processing the uploaded image, analyzing it using AWS Rekognition, storing the results in DynamoDB, deleting the original image from S3, and publishing the results to an SNS topic.
import json
import boto3
import random
import string
s3_client = boto3.client('s3')
rekognition_client = boto3.client('rekognition')
dynamodb = boto3.resource('dynamodb')
sns_client = boto3.client('sns')
DYNAMODB_TABLE_NAME = 'KYCResults'
SNS_TOPIC_ARN = 'arn:aws:sns:your-region:your-account-id:ImageProcessingTopic'
REFERENCE_IMAGE_BUCKET = 'your-reference-bucket'
REFERENCE_IMAGE_KEY = 'reference/face.jpg'
def lambda_handler(event, context):
# Extract information from the S3 event
for record in event['Records']:
bucket_name = record['s3']['bucket']['name']
object_key = record['s3']['object']['key']
# Assuming the user_id is part of the object key path, e.g., 'uploads/{user_id}/image.jpg'
user_id = object_key.split('/')[1]
# Download the image from S3
response = s3_client.get_object(Bucket=bucket_name, Key=object_key)
image_data = response['Body'].read()
# Detect faces in the image using AWS Rekognition
face_detection_response = rekognition_client.detect_faces(
Image={'Bytes': image_data},
Attributes=['ALL']
)
if face_detection_response['FaceDetails']:
# If a face is detected, set extracted_text to "expected text"
extracted_text = "expected text"
# Compare the detected face with the reference image
reference_image_response = s3_client.get_object(
Bucket=REFERENCE_IMAGE_BUCKET, Key=REFERENCE_IMAGE_KEY)
reference_image_data = reference_image_response['Body'].read()
comparison_response = rekognition_client.compare_faces(
SourceImage={'Bytes': reference_image_data},
TargetImage={'Bytes': image_data}
)
if not comparison_response['FaceMatches']:
# If the face does not match the reference image, set extracted_text to random text
extracted_text = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
else:
# If no face is detected, set extracted_text to random text
extracted_text = ''.join(random.choices(string.ascii_letters + string.digits, k=10))
# Store the processing result in DynamoDB
table = dynamodb.Table(DYNAMODB_TABLE_NAME)
table.put_item(Item={
'UserId': user_id,
'ImageKey': object_key,
'ExtractedText': extracted_text,
'ProcessingStatus': 'Completed'
})
# Delete the original image from S3
s3_client.delete_object(Bucket=bucket_name, Key=object_key)
# Publish a message to the SNS topic
message = {
'user_id': user_id,
'image_key': object_key,
'extracted_text': extracted_text
}
sns_client.publish(
TopicArn=SNS_TOPIC_ARN,
Message=json.dumps(message)
)
return {
'statusCode': 200,
'body': json.dumps('Image processed and results stored successfully')
}
The validate_image
Lambda function in the serverless KYC architecture is responsible for validating the results of image processing and determining whether the KYC verification was successful or not.
import json
import boto3
dynamodb = boto3.resource('dynamodb')
sns_client = boto3.client('sns')
DYNAMODB_TABLE_NAME = 'KYCResults'
SUCCESS_TOPIC_ARN = 'arn:aws:sns:your-region:your-account-id:ImageProcessingSuccessTopic'
FAILURE_TOPIC_ARN = 'arn:aws:sns:your-region:your-account-id:ImageProcessingFailureTopic'
def lambda_handler(event, context):
for record in event['Records']:
# Get the SNS message
sns_message = json.loads(record['Sns']['Message'])
user_id = sns_message['user_id']
image_key = sns_message['image_key']
extracted_text = sns_message['extracted_text']
# Retrieve the processing result from DynamoDB
table = dynamodb.Table(DYNAMODB_TABLE_NAME)
response = table.get_item(Key={'ImageKey': image_key, 'UserId': user_id})
item = response.get('Item')
# Validate the image processing result
if item and 'ExtractedText' in item:
# Dummy validation logic
if extracted_text == "expected text":
validation_status = 'Success'
sns_client.publish(
TopicArn=SUCCESS_TOPIC_ARN,
Message=json.dumps({
'user_id': user_id,
'image_key': image_key,
'status': validation_status
})
)
else:
validation_status = 'Failure'
sns_client.publish(
TopicArn=FAILURE_TOPIC_ARN,
Message=json.dumps({
'user_id': user_id,
'image_key': image_key,
'status': validation_status
})
)
else:
validation_status = 'Failure'
sns_client.publish(
TopicArn=FAILURE_TOPIC_ARN,
Message=json.dumps({
'user_id': user_id,
'image_key': image_key,
'status': validation_status
})
)
return {
'statusCode': 200,
'body': json.dumps('Validation process completed')
}
The Amazon API Gateway serves as a crucial component in this serverless KYC architecture by providing a secure, scalable, and managed entry point for clients to interact with the backend services. Here's a detailed look at what the API Gateway does in this pipeline:
upload_image
Lambda function when a request is received at the designated endpoint.KYCImageUploadAPI
) and click "Create API".upload
) and Resource Path (e.g., /upload
)./upload
resource.POST
from the dropdown menu and click the checkmark.upload_image
).image/jpeg
and click "Save Changes".prod
).You can also use this OpenAPI definition file below:
{
"swagger": "2.0",
"info": {
"title": "KYC Image Upload API",
"version": "1.0"
},
"paths": {
"/upload": {
"post": {
"x-amazon-apigateway-integration": {
"uri": "arn:aws:apigateway:{region}:lambda:path/2015-03-31/functions/arn:aws:lambda:{region}:{account-id}:function:upload_image/invocations",
"httpMethod": "POST",
"type": "aws_proxy"
},
"responses": {
"200": {
"description": "Image uploaded successfully"
},
"400": {
"description": "Invalid input"
},
"500": {
"description": "Internal server error"
}
},
"parameters": [
{
"name": "Content-Type",
"in": "header",
"required": true,
"type": "string"
},
{
"name": "body",
"in": "body",
"required": true,
"schema": {
"type": "object",
"properties": {
"image_data": {
"type": "string"
},
"user_id": {
"type": "string"
}
},
"required": ["image_data", "user_id"]
}
}
]
}
}
}
}
Make sure you replace the region and accountid placeholders.
Configure S3 to Trigger Lambda:
ImageUploaded
s3:ObjectCreated:*
uploads/
process_image
There are three SNS (Amazon Simple Notification Service) topics involved, each serving a specific function in the KYC image processing pipeline:
validate_image
Lambda function, about the completion of image processing by the process_image
Lambda function.process_image
Lambda function publishes a message to this topic once it has finished processing an image and storing the results in DynamoDB.validate_image
Lambda function is subscribed to this topic to receive notifications and trigger validation and further processing of the KYC data.validate_image
Lambda function, upon successful validation of the processed image data stored in DynamoDB, publishes a success message to this topic.validate_image
Lambda function encounters an issue during validation or if the processed data does not meet the validation criteria, it publishes a failure message to this topic.ImageProcessingTopic
ImageProcessingSuccessTopic
ImageProcessingFailureTopic
validate_image
Lambda function to be triggered by messages published to the ImageProcessingTopic
.By using separate SNS topics for different notification purposes, the architecture maintains modularity, flexibility, and clarity in handling different types of events and outcomes during the KYC image processing workflow. It allows for targeted notifications to specific components or teams based on the nature of the event (success or failure) and facilitates effective coordination and response handling within the system.
In this architecture, DynamoDB serves as the persistent storage layer for storing the results of image processing and KYC verification. By utilizing DynamoDB as the backend database, the architecture maintains data integrity, supports real-time processing, and enables seamless integration with other AWS services within the serverless environment.
To create the required DynamoDB table for this architecture, you can follow these steps:
Prepare the Test Environment
reference/face.jpg
, is uploaded to your S3 bucket in the correct location. This image will be used for face comparisons by the process_image function. I found a good refrenece image here. Python provides a simple way to encode images in base64 using the built-in base64
module. Here is an example:
import base64
# Path to the image file
image_path = 'path/to/your/image.jpg'
# Read the image file
with open(image_path, 'rb') as image_file:
image_data = image_file.read()
# Encode the image data in base64
encoded_image = base64.b64encode(image_data).decode('utf-8')
# Print the base64 encoded image
print(encoded_image)
encode_image.py
and run it:python encode_image.py
Postman is a popular tool for testing APIs. It can also help you encode files in base64.
POST
.python encode_image.py
Replace <base64-encoded-image>
with the actual base64 string.
If you prefer using the command line, you can use cURL
to encode an image and send it as a base64 string in a POST request.
base64
command-line tool (available on Unix-like systems) to encode an image.base64 path/to/your/image.jpg > encoded_image.txt
encoded_image.txt
file and copy the base64 string.payload.json
) with the following content:{
"image": "<base64-encoded-image>"
"user_id": "user123"
}
Replace <base64-encoded-image>
with the actual base64 string from encoded_image.txt
.
curl -X POST https://<api-id>.execute-api.<region>.amazonaws.com/prod/upload \
-H "Content-Type: application/json" \
-d @payload.json
You can even test from the API gateway directly.
The DynamoDB table also persisted important metadata - as expected.
This architecture aligns with AWS Well-Architected principles, which are designed to help you build secure, high-performing, resilient, and efficient infrastructure for your applications. Let's examine how this KYC image processing pipeline adheres to each of the AWS Well-Architected Framework's five pillars:
This guide provides a robust framework for implementing a modern KYC system using AWS's serverless architecture. By leveraging services such as API Gateway, Lambda, S3, DynamoDB, Rekognition, and SNS, we can create a secure, scalable, and efficient image processing pipeline that automates and enhances the KYC verification process. This architecture not only meets the demands of contemporary digital services but also ensures regulatory compliance and user trust.
We could also improve on this architecture in different ways. For example, by using a step functions workflow and by adding checks for already existing images before alidating the kyc process. You can follow this project or this one to learn how to implement step functions in AWS.
This AWS-powered next-gen KYC architecture follows the Well-Architected principles by utilizing a serverless, event-driven approach. This design ensures operational excellence, security, reliability, performance efficiency, and cost optimization. By leveraging AWS's managed services, the system is both resilient and scalable, providing a robust framework for secure and efficient KYC processing.
Happy Clouding !!!
Did you like this post?
If you did, please buy me coffee 😊