File Iterator
Overview
The File Iterator component lets users loop over matching files in a remote file system.
The component searches for files in a number of remote file systems, running its attached component once for each file found. Filenames and path names are mapped into environment variables, which can then be referenced from the attached component(s).
To attach the iterator to another component, use the blue output connector and link to the desired component. To detach, right-click on the attached component and click Disconnect from Iterator.
If you need to iterate more than one component, put them into a separate orchestration job or transformation job and use a Run Transformation or Run Orchestration component attached to the iterator. In this way, you can run an entire ETL flow multiple times, once for each row of variable values.
All iterator components are limited to a maximum 5000 iterations.
Properties
Name
= string
A human-readable name for the component.
Input Data Type
= drop-down
Select the remote file system to search. Available data types include: Azure Blob Storage, Cloud Storage, FTP, HDFS, S3, SFTP, and Windows Fileshare.
Input Data URL
= drop-down
Input the URL, including the full path and file name followed by forward slash, that will point to the files to download to the selected staging area.
Example: DataType://${jv_blobStorageAccount}/${jv_containerName}/
.
Special characters used in this field (e.g. in usernames and passwords) must be URL-safe. For more information, please refer to our Safe Characters documentation.
Domain
= string
Input your connection domain.
SFTP Key
= string
Input your SFTP private key. This property will only be used if the data source requests it. This property is only available when the Input Data Type is set to SFTP.
Username
= string
Input your URL connection username. This property will only be used if the data source requests it.
Password
= string
Input your URL connection password. This property will only be used if the data source requests it. Users can store passwords in the component itself, or use the secure password manager feature (recommended).
Set Home Directory as Root
= drop-down
- No: Designates that the URL path is from the server root.
- Yes: Designates that the URL path is relative to the user's home directory (default).
This property is only available when the Input Data Type is set to either FTP or SFTP.
Recursive
= drop-down
- No: Only search for files within the folder identified by the Input Data URL.
- Yes: Consider files in subdirectories when searching for files.
This property is only available when the Input Data Type is set to FTP, SFTP, or Windows Fileshare.
Max Recursion Depth
= integer
Set the maximum recursion depth into subdirectories. This property is only available when Recursive is set to Yes.
Ignore Hidden
= drop-down
- No: Include "hidden" files.
- Yes: Ignore "hidden" files, even if they otherwise match the Filter Regex. Default setting is Yes.
Max Iterations
= integer
Set the total number of iterations to perform. As mentioned earlier, the maximum can't exceed 5000.
Filter Regex
= string
Filter Regex starts with a variable that represents the folder name with /.*
as the suffix. The forward slash defines to look within the folder. The .*
is the wildcard to return all files in that folder.
Example: ${jv_folder}/.*
The java-standard regular expression used to test against each candidate file's full path.
If Filter Regex has a folder structure ${jv_folder}/.*
, you do need to have a Recursive value as YES to find the folder beyond Input Data URL path DataType://${jv_blobStorageAccount}/${jv_containerName}/
.
Concurrency
= drop-down
- Concurrent: Iterations run concurrently. This requires all "Variables to Iterate" to be defined as copied variables, so that each iteration gets its own copy of the variable isolated from the same variable being used by other concurrent executions.
- Sequential: Iterations run in sequence, waiting for each to complete before starting the next. This is the default setting.
The maximum concurrency is limited by the number of available threads (2x the number of processors on your cloud instance).
Variables
= columns editor
- Variable: An existing environment variable to hold the given value of the Path Selection.
- File Attribute: For each matched file, the target variable can be populated with the Base Path, the Subfolder (useful when recursing), the Filename, or the date of when the file was Last Modified. You can export any or all of these into variables used by each iteration.
For the Last Modified attribute, the date is formatted as ISO8601, with a UTC indicator. For example, 2021-01-04T10:45:15.123Z.
Users may experience a lag in how their data warehousing platform updates the last modified date, for example between when Matillion ETL interacts with the file versus the actual last modified date. This behaviour is a limitation to the platform and is subject to that platform's metadata.
Break on Failure
= drop-down
- No: Attempt to run the attached component for each iteration, regardless of success or failure. This is the default setting.
- Yes: If the attached component doesn't run successfully, fail immediately.
If a failure occurs during any iteration, the failure link is followed. This parameter controls whether it is followed immediately or after all iterations have been attempted.
This property is only available when Concurrency is set to Sequential. When set to Concurrent, all iterations will be attempted.
Record Values In Task History
= drop-down
Choose whether to record iteration values in the Matillion ETL Task History. The default setting is Yes.
Stop On Condition
= drop-down
Select Yes to stop the iteration based on a condition specified in the Condition property. The default setting is No.
For this property to be available, set Concurrency to Sequential.
Mode
= drop-down
Select the method of creating the condition.
- Simple: A no-code Condition UI will open, where users must specify an Input Variable, Qualifier, Comparator, and Value using drop-down menus and text fields. This is the default setting.
- Advanced: An editor will open, where users must write the condition manually using SQL.
Condition (Simple mode)
= columns editor
- Input Variable: An input variable to form a condition around.
- Qualifier:
- Is: Compares the input variable to the value using the Comparator.
- Not: Reverses the effect of the comparison, so "Equals" becomes "Not equals", "Less than" becomes "Greater than or equal to", etc.
- Comparator: Select the comparator. Available comparison operators include "Less than", "Less than or equal to", "Equal to", "Greater than or equal to", "Greater than", and "Blank".
- Value: Specify the value to be compared.
Condition (Advanced mode)
= text editor
Manually write the condition in the editor. This editor accepts conditions written in JavaScript.
Combine Conditions
= drop-down
Use the defined conditions in combination with one another according to either And or Or.
This property is only available when Mode is set to Simple.
Variable Exports
This component makes the following values available to export into variables:
Source | Description |
---|---|
Iteration Attempted | The number of iterations that this component attempts to reach (Max Iterations parameter). |
Iteration Generated | The number of iterations that have been initiated. Iterators terminate after failure, so this number will be the successful iterations plus any potential failures. |
Iteration Successful | The number of iterations successfully performed. This is the max iteration number, minus failures and any unattempted iterations (since the component terminates after failure). |
Video
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
✅ | ✅ | ✅ | ✅ | ✅ |