Triggering ETL from an email via SES and Lambda
This example details how to have the arrival of an email at an Amazon SES controlled email address trigger a Matillion job via AWS S3 and a Lambda function. This process will extract any TEXT/CSV attachments from emails sent to an SES email address and load them into the target database.
Note
The following instructions are for illustrative purposes. Any screenshots, code, or steps may change. If you refer to the third party involved, you may find more up to date instructions.
Why Do This
Perhaps you'd like the the ability for a business user to create a standard CSV file and simply email it to Matillion ETL, rather than have to give them access to and train them on using S3 buckets. Or perhaps you receive a CSV email from a supplier you'd like to simply forward to the target database.
Prerequisites
This exercise requires some items set up in your AWS account before you begin:
- A domain name for which Amazon SES receives the email for - see the Integration with Amazon SQS.
Basic Workflow
The basic steps and services involved in this process are:
- User sends an email to a specific AWS SES controlled email address
- AWS SES copies the email in plain text format to the S3 Landing Bucket
- S3 Trigger calls a Lambda function to scan the email for TEXT/CSV attachments and output them to S3 Process Attachment bucket
- Same Lambda function writes a message to an SQS Queue containing the required parameters to run a Matillion ETL for (including the bucket and filename written by the Lambda function)
- Matillion ETL consumes the message from the SQS Queue and runs a job to load the file to the target database.
The process is summarised in this diagram:
Example Steps
Note: example files are attached to this article should you wish to try this yourself.
- Configure an email address to be managed by SES as per the AWS documentation
- Configure an SES Receipt Rule to perform an S3 action (see here) to place all received emails into the first S3 bucket (S3 Landing) , e.g.
- Any emails sent to this address will be dropped into the S3 bucket (in the example mtln-email-test) in plain text format. Note you may need to amend the bucket permissions to allow the SES process access
- Within the AWS Console create a New Lambda function for Python 2.7, ideally using the s3-get-object-python Blueprint (using the Blueprint will allow you to configure which bucket the trigger will be from, and a number of permissions to). If you are using the template method set these values on the next page:
- Name: Matillion_Email_Test
- Role: Create new role(s) from Template
- Role Name: Matillion_Email_Role
- Bucket: <choose the emails land in>
- Event Type: Object Created (All)
- Enable Trigger: Yes
- Press "Create Function"
- Replace the code contents of the Lambda function generated by AWS with the contents of the file lambda_matillion_email_to_execution.py
- Amend the variables at the top of the Python code to the specifics of your environment:
- S3_OUTPUT_BUCKETNAME: The bucket created for processed files to be output to
- SQS_MATILLION_QUEUE: The SQS Matillion is listening to
- MTLN_GROUP: The Project Group Name in Matillion ETL
- MTLN_PROJECT: The Project Name in Matillion ETL
- MTLN_ENV: The name of the environment to execute against in Matillion ETL
- Save the function (note: you may need to amend the Permissions on your S3 Process Attachment bucket to allow the Lambda function to write to it)
- Import the appropriate JSON file into your Matillion ETL instance (based on whether you're using Redshift or Snowflake)
- Ensure your Matillion ETL instance is listening to the same SQS Queue as configured in the Python Lambda
- That's it! Test by emailing the Data_matillion_email_to_execute_example.csv file attached to this article to your SES configured email address - If you are watching Matillion ETL you should see the imported job run.
- A table should be created in the target default schema containing the contents of the CSV, with a unique name.
Troubleshooting
Most problems you may experience with this will be related to permissions, ensure your Lambda function has the required permissions to:
- Read from your source S3 bucket
- Write to your target S3 bucket
- Write messages to your SQS queue
Do not hesitate to Contact Support for further assistance.