Connectors in Matillion ETL have the shared task of taking data from one system and pushing it to another. All connectors are used by placing them on the canvas of an orchestration job, although not all connectors may be available depending on your Matillion ETL version. A brief description of the different types of connector components is given below.
Types of connector
The most common type of connector. These take data from a third-party service and push it into the target cloud data warehouse table. The data is briefly staged in a storage area so it can be formatted correctly for the cloud data warehouse. These components are destructive—the target table is dropped (if it exists) and recreated each time.
Extract components are similar to Query components in that they take data from a third-party and push it into a target cloud data warehouse table. However, unlike Query components, they do not flatten the data into table rows and columns. Instead, the data is "structured" as it appeared in the source, usually in a JSON-like format. This data commonly requires the use of the Extract Nested Data component to define how it is flattened into rows.
Load components take properly formatted data from storage or a database and load it into the target cloud data warehouse table.
Incremental Load components pull data from a service into the target cloud data warehouse table but only pull data that has been changed since the previous pull. Tables are not dropped and recreated by this process. Instead, a table is updated when it is detected that a record is different between target and source.
Output components perform reverse-ETL, taking table data from your data warehouse, formatting it appropriately, and then pushing it to your target service.
Component properties differ between connectors. However, there are some common themes that should help the process feel familiar. Below are a few of the more common properties that illustrate the core usage of connector components.
- Data Source: The table or view that this connector is using as a data source from the source system. Available data sources are usually dictated by the source system and have predictable structures as given in that source's data model, available from the respective connector's documentation.
- Data Selection: Allows the user to choose which columns are taken from the chosen source.
- Data Source Filter: Allows users to choose whether a row is included in the load. Values are matched against user-defined criteria for all rows in the load and discounted if they fail these checks.
- Limit: Sets the maximum number of rows to be loaded.
- OAuth: Selects a Managed OAuth entry to be selected for the data source system. OAuth entries are defined and stored separately so the same OAuth entry can be used in multiple components if wanted.
- (Staging) Location: Select a location (usually cloud storage such as Amazon S3, Google Cloud Storage, or Azure Blob Storage) for the data to be staged while it is transformed into table data for the data warehouse.
- Target Table: The target table on the data warehouse for data to be loaded into. There are usually properties before this that allow the user to choose which database and schema to look in for the target table. Note that while OAuth entries are used to define available source resources, your target resources (databases, schemas, and tables in the cloud data warehouse) rely on which environment you currently have selected and the configuration of that environment.