Data staging components
Overview
Matillion ETL comes with many components that can retrieve data from other services and load that data into a table. These components can be called data staging component, data stagers, connectors, query components or integrations, interchangeably.
Each data stager retrieves data via API calls using SQL queries that can filter the data before it reaches the cloud data warehouse. The data is loaded into a table of your choosing and by default will destroy and recreate that table before loading the data.
You can avoid the target table being destroyed and recreated by setting Recreate Target Table to Off in the Load Options property.
Query components in Matillion ETL come with a data model that describes connection options, tables, and views associated with the collection of data from that particular service. If there is no data model associated with the component, it likely uses the Matillion ETL API Query model and/or a custom API query profile.
Most data stagers tend to work in the same way: you enter relevant credentials to access the service, and opt to use either a Basic or Advanced mode for your queries.
For some data staging components, an Incremental Load Generator is available, which can be considerably easier to use than creating your own incremental loads.
Basic Query
When a Query component is set to Basic Mode, the component will come with many properties that the user can set (usually from a predefined list) and will build an SQL query from those choices. The user does not write any SQL themselves but instead can select a data source and columns that can then be filtered.
Advanced Query
When a Query component is set to Advanced Mode, a property named SQL Query is available, and users can write their own SQL queries to the service via the SQL editor, shown below.
For most Query components, available data sources and their columns are listed in the Metadata Explorer panel. Environment variables are listed in the Variables panel. Columns and variables can be brought into the SQL by double-clicking them, or you can type their names directly into the SQL.
Test your SQL by clicking Sample. This will return a sample of the queried data, up to a specified row count Limit (or 10 rows by default).
STL load errors
When an error occurs loading data through these components, you will be notified of the error within the Tasks tab for that run.
Click STL Load Errors to open a dialog with verbose error output.
Each error in the output is given as a collapsible box. Listed is its position in the data, what the offending row holds, and the reason for the error. These results are paginated and can be explored using the Next and Back buttons.
Debug Mode
If you are experiencing issues with a data staging component, we encourage you to contact Matillion support. Please provide us with a Debug Log when you contact support. This can be generated from most data staging components.
To generate a Debug Log, find the Auto Debug property at the bottom of a component and turn it On. Turning on Auto Debug activates the Debug Level property. A description of these properties is given below:
Auto Debug
= drop-down
Choose whether to automatically log debug information about your load. These logs can be found in the task history and should be included in support requests concerning the component. Turning this on will override any debugging connection options.
Debug Level
= drop-down
The level of verbosity with which your debug information is logged. Levels above 1 can log huge amounts of data and result in slower execution.
- Will log the query, the number of rows returned by it, the start of execution and the time taken, and any errors.
- Will log everything included in Level 1, plus cache queries and additional information about the request, if applicable.
- Will additionally log the body of the request and the response.
- Will additionally log transport-level communication with the data source. This includes SSL negotiation.
- Will additionally log communication with the data source, as well as additional details that may be helpful in troubleshooting problems. This includes interface commands.
Warning
Do not leave Auto Debug On when not directly in need of it, as it can consume large amounts of disk space.
Refresh Source Schema
Sometimes a source schema that Matillion ETL is using might be changed externally from Matillion ETL. An example of this is creating a new table via console commands and then attempting to find that table in a data staging component in Matillion ETL, only to find that the table does not appear. This is due to Matillion ETL using a cache of table data for many components.
To sync the Matillion ETL client with your platform and resolve this issue, right-click the component in question and select Refresh Source Schema.
Sampling data
Data staging components in Matillion ETL have a sample option, which allows you to display a sample of the data the component produces before committing to running your full orchestration or transformation job.
On the component's Sample tab, click Data to load the sample data into Matillion ETL. You can limit the number of rows retrieved from the dataset, if required.