Using incron to automatically copy data to S3
Overview
This is not strictly related to Matillion ETL however it is a common task to get data easily into S3 from a premise or system. This approach is especially used in a micro-batching scenario.
The below instructions should be enough to get you started with incron however this is a powerful tool that is capable of much more than the scope of this article can cover.
Installation
To begin, incron must be installed. This deamon will set and watch a directory for changes. To get this on Amazon linux run
sudo yum install incron
Once installed delete the /etc/incron.allow file if it exists. This is used to allow who can use the tool.
sudo rm /etc/incron.allow
Alternatively allow your user in the file.
Setting up aws-cli
aws configure
Follow the on screen instructions. Once installed create the script that will copy your file. Run
incrontab -e
To edit the inotify configuration, add a line as so:
<path to local watched directory> IN_CLOSE_WRITE /usr/local/bin/aws s3 cp $@/$# s3://<bucket name><prefix path>
Example:
/home/ed/lntest IN_CLOSE_WRITE /usr/local/bin/aws s3 cp $@/$# s3://matillion/delme/
Now to test add a file to the watched directory and it will appear in S3. If anything is wrong it is written to /var/log/syslog
Full in incron documentation is worth a read and its available here.