Data Transfer
The Data Transfer component enables users to transfer files from a chosen source to a chosen target.
This component can use a number of common network protocols to transfer data to a variety of sources. The component copies (it does not move) the target file. Setting up this component requires selecting a source type and a target type. The component's other properties will change to reflect those choices.
Currently supported data sources include: Amazon S3, Azure Blob Storage, Box, Dropbox, FTP, Google Cloud Storage, HDFS, HTTP, HTTPS, Microsoft Exchange, Microsoft SharePoint, SFTP, and Windows Fileshare.
Currently supported targets: Amazon S3, Azure Blob Storage, Google Cloud Storage, HDFS, SFTP, and Windows Fileshare.
Note
- To ensure that instance credentials access is managed correctly at all times, we always advise that customers limit scopes (permissions) where applicable.
- FTPS is not supported through this component. We recommend using SFTP instead, or installing a tool that supports FTPS and calling it from a Bash Script component. You could also do this using cURL, which is available as standard.
- From version 1.68.9 onwards of Matillion ETL, using the unpack option on
.exe
archives and compressed files is no longer supported. - When reading from or writing to Windows Fileshare, the SMB2 protocol will be preferred. SMB1 is still supported.
- When the Source Type property is set to FTP, HTTP, HTTPS, SFTP, or Windows Fileshare, users will need to set a Source URL property—this property provides a template URL, with placeholder values inside square brackets
[]
. Replace the placeholder values with the values of your actual source URL.
Properties
Name
= string
A human-readable name for the component.
Source Type
= drop-down
Select the type of data source.
Unpack ZIP File
= boolean
Select Yes if the source data is a ZIP file that you wish to unpack before being transferred.
Target Type
= drop-down
Select the target type for the new file.
Target Object Name
= string
The filename of the new file.
gzip Data
= boolean
Select Yes if you wish to gzip the transferred data when it arrives at the target.
Source properties
Amazon S3
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
When a user enters a forward slash character /
after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
Azure Blob Storage
Blob Location
= string
The URL, including full path and file name, that points to the source file that exists on Azure Blob Storage.
When a user enters a forward slash character /
after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
Box
Authentication
= drop-down
Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, read Box Extract authentication guide.
File Id
= string
The ID of the Box file.
Dropbox
Authentication
= drop-down
Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, read Dropbox Extract authentication guide.
File Type
= drop-down
Select either File or Paper.
Path
= string
The path to the Dropbox file.
File Id
= string
The ID of the Dropbox file.
Download as ZIP
= boolean
Select whether or not to download as a `ZIP file.
Download Format
= drop-down
Select HTML or Markdown.
FTP
Set Home Directory as Root
= boolean
- Yes: URLs are relative to the user's home directory.
- No: URLs are relative to the server root.
Google Cloud Storage
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
When a user enters a forward slash character /
after a folder name, a validation of the file path is triggered. This works in the same manner as the Go button.
HDFS
Source URL
= string
The URL, including full path and file name, that points to the source file.
HTTP
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
The Source URL must start with sftp://
or else an authentication failure message will be returned.
Source Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Source Password
= string
The corresponding password.
HTTPS
Perform Certificate Validation
= drop-down
Check that the SSL certificate for the host is valid before taking data.
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
The Source URL must start with sftp://
or else an authentication failure message will be returned.
Source Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Source Password
= string
The corresponding password.
Microsoft Exchange
Authentication
= drop-down
Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, read Microsoft Exchange Query authentication guide.
Attachment Source
= drop-down
Select Event or Message.
Source ID
= string
A valid eventId
or messageId
, depending on the attachment source selected.
Attachment ID
= string
A valid id
, which can be acquired by querying EventAttachments or MessageAttachments.
Microsoft SharePoint
URL
= string
The URL is the web address that you visit to log in to your SharePoint account.
User
= string
A valid SharePoint username to use for authentication.
Password
= string
The corresponding password. Store the password in the component, or create a managed entry for the password using Manage Passwords (recommended).
SharePoint Edition
= drop-down
Select your edition of Microsoft SharePoint.
Connection Options
= column editor
- Parameter: A JDBC parameter supported by the database driver. The available parameters are explained in the data model. Manual setup is not usually required, since sensible defaults are assumed.
- Value: A value for the given Parameter.
File Type
= drop-down
Select whether the file type is a document or attachment.
Library
= drop-down
The location where you can upload, create, update, and collaborate on files. For more information, read Introduction to libraries.
File URL
= string
The URL that points to the file.
SFTP
Set Home Directory as Root
= boolean
- Yes: URLs are relative to the user's home directory.
- No: URLs are relative to the server root.
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
The Source URL must start with sftp://
or else an authentication failure message will be returned.
Source Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Source Password
= string
The corresponding password.
Source SFTP Key
= string
Your SFTP private key. It is optional and will only be used if the data source requests it.
This must be the complete private key, beginning with "-----BEGIN RSA PRIVATE KEY-----" and conforming to the same structure as an RSA private key.
Note
The SFTP key must be unencrypted/not passphrase protected, as this is not supported.
Windows Fileshare
Source URL
= string
The URL, including full path and file name, that points to the source file. The source URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
Source Domain
= string
The domain that the source file is located on.
Source Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Source Password
= string
The corresponding password.
Target properties
Amazon S3
Target URL
= string
The URL (without file name) that points to where the new file will be created. The URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
Access Control List Options
= drop-down
Choose from ACL settings that Amazon provide. Leaving it empty doesn't change the current settings. A full list can be found here.
Encryption
= drop-down
Decide how the files are encrypted inside the S3 bucket. This property is available when using an existing Amazon S3 location for staging.
- None: No encryption.
- SSE KMS: Encrypt the data according to a key stored on KMS. Read AWS Key Management Service (AWS KMS) to learn more.
- SSE S3: Encrypt the data according to a key stored on an S3 bucket. Read Using server-side encryption with Amazon S3-managed encryption keys (SSE-S3) to learn more.
KMS Key ID
= drop-down
The ID of the KMS encryption key you have chosen to use in the Encryption property.
Azure Blob Storage
Blob Location
= string
The full URL that points to the target file location on Azure Blob Storage.
Google Cloud Storage
Target URL
= string
The URL (without file name) that points to where the new file will be created. The URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
HDFS
Target URL
= string
The URL (without file name) that points to where the new file will be created. The URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
SFTP
Set Home Directory as Root
= boolean
- Yes: URLs are relative to the user's home directory.
- No: URLs are relative to the server root.
Target URL
= string
The URL (without file name) that points to where the new file will be created. The URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
Target Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Target Password
= string
The corresponding password.
Target SFTP Key
= string
Your SFTP private key. It is optional and will only be used if the data source requests it.
This must be the complete private key, beginning with "-----BEGIN RSA PRIVATE KEY-----" and conforming to the same structure as an RSA private key.
Windows Fileshare
Target URL
= string
The URL (without file name) that points to where the new file will be created. The URL template includes placeholder values inside either square brackets []
or angled brackets <>
. Replace the placeholder values with the actual values of your source URL.
Target Domain
= string
The domain that the newly created file is to be located on.
Target Username
= string
Your URL connection username. It is optional and will only be used if the data source requests it.
Target Password
= string
The corresponding password.
Copying files to an Azure Premium Storage blob
When copying files to an Azure Premium Storage blob, Matillion ETL may provide the following error: "Self-suppression not permitted."
This is because, unlike standard Azure Storage, Azure Premium Storage does not support block blobs, append blobs, files, tables, or queues. Premium Storage supports only page blobs that are incrementally sized.
A page blob is a collection of 512-byte pages that are optimized for random read and write operations. Thus, all writes must be 512-byte-aligned and so any file that is not sized to a multiple of 512 will fail to write.
To learn more about Azure Storage blobs, read Understanding block blobs, append blobs, and page blobs.