API Extract

Overview

The API Extract component lets you create your own custom connector in Matillion ETL by extracting and loading data from a specified API to be either referenced by an external table or loaded into a table, depending on your cloud data warehouse. You can then use transformation components to enrich and manage the data in permanent tables.

Using this component may return structured data that requires flattening. For help with flattening such data, we recommend using the following components:

Extract Nested Data for Snowflake or Google BigQuery.
Nested Data Load for Amazon Redshift.

Using the API Extract component requires at least one configured Extract Profile. Read Manage Extract Profiles for more information on completing the Manage Extract Profile setup, including adding a new endpoint.

The Manage Extract Profiles wizard only supports the receipt of, and sending of, JSON objects. Other formats, such as XML, are not supported.

Warning

An error will occur if an endpoint name is any one of the following:

datasourcelists
environments
versionlists
versions

Please avoid naming an endpoint as such. You specify the endpoint's name on page 1 of the Configure Extract Connector wizard.

Properties

SnowflakeAmazon RedshiftGoogle BigQuery

Name = string

A human-readable name for the component.

API = drop-down

A configured API Extract profile. Click Project → Manage API Profiles → Manage Extract Profiles to manage your Extract profiles.

Data Source = drop-down

Select the data source to extract and load.

URI Params = parameter:value

Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Query Params = parameter:value

Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Header Params = parameter:value

Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Post Body = JSON

A request body for the POST.

User = string

The username to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Password = string

The password used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Bearer Token = string

The API bearer token used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Bearer Token.

OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. For more information, read Manage OAuth. Only available when the Extract profile's Auth type is set to OAuth.

Page Limit = integer

Integer value for the limit of pages to stage.

Location = filepath

Provide an Amazon S3 bucket path, Google Cloud Storage (GCS) bucket path, or Azure Blob Storage path that will be used to store the data. The data can then be referenced by an external table. A folder will be created at this location with the same name as the target table.

Integration = drop-down

(GCP only) Choose your Google Cloud Storage Integration. Integrations are required to permit Snowflake to read data from and write to a Google Cloud Storage bucket. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read our Storage Integration setup guide.

Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value [Environment Default] uses the warehouse defined in the environment. Read Overview of Warehouses to learn more.

Database = drop-down

The Snowflake database. The special value [Environment Default] uses the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.

Schema = drop-down

The Snowflake schema. The special value [Environment Default] uses the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.

Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.

Name = string

A human-readable name for the component.

API = drop-down

A configured API Extract profile. Click Project → Manage API Profiles → Manage Extract Profiles to manage your Extract profiles.

Data Source = drop-down

Select the data source to extract and load.

URI Params = parameter:value

Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Query Params = parameter:value

Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Header Params = parameter:value

Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Post Body = JSON

A request body for the POST.

User = string

The username to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Password = string

The password used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Bearer Token = string

The API bearer token used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Bearer Token.

OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. For more information, read Manage OAuth. Only available when the Extract profile's Auth type is set to OAuth.

Page Limit = integer

Integer value for the limit of pages to stage.

Location = filepath

Provide an Amazon S3 bucket path that will be used to store the data. The data can then be referenced by an external table. A folder will be created at this location with the same name as the target table.

Type = drop-down

External: The data will be put into your chosen S3 bucket and referenced by an external table.
Standard: The data will be staged on your chosen S3 bucket before being loaded into a table. This is the default setting.

Standard Schema = drop-down

The Amazon Redshift schema. The special value [Environment Default] uses the schema defined in the environment. Read Schemas to learn more.

External Schema = drop-down

The table's external schema. Read Getting Started with Amazon Redshift Spectrum to learn more.

Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.

Name = string

A human-readable name for the component.

API = drop-down

A configured API Extract profile. Click Project → Manage API Profiles → Manage Extract Profiles to manage your Extract profiles.

Data Source = drop-down

Select the data source to extract and load.

URI Params = parameter:value

Any parameters that are configured for this endpoint in the wizard will be displayed here. URI parameters cannot be set as constants. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Query Params = parameter:value

Specify any query parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Header Params = parameter:value

Specify any header parameters. Any parameters that are configured for this endpoint in the wizard will be displayed here. Constants will not appear here. You can toggle the Text Mode checkbox to navigate between grid mode and text mode.

Post Body = JSON

A request body for the POST.

User = string

The username to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Password = string

The password used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Basic Auth.

Bearer Token = string

The API bearer token used to authenticate the endpoint. Only available when the Extract profile's Auth type is set to Bearer Token.

OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. For more information, read Manage OAuth. Only available when the Extract profile's Auth type is set to OAuth.

Page Limit = integer

Integer value for the limit of pages to stage.

Table Type = drop-down

Select whether the table is Native (by default in BigQuery) or an external table.

Project = drop-down

Select the Google Cloud project. The special value [Environment Default] uses the project defined in the environment. For more information, read Creating and managing projects.

Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value [Environment Default] uses the dataset defined in the environment. For more information, read Introduction to datasets.

Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.

New Target Table = string

A name for the new external table. Only available when the table type is External.

Cloud Storage Staging Area = Google Cloud Storage bucket

The URL and path of the target Google Cloud Storage bucket to be used for staging the queried data. Only available when the table type is Native.

Location = Google Cloud Storage bucket

The URL and path of the target Google Cloud Storage bucket. Only available when the table type is External.

Load Options = multiple drop-downs

Clean Cloud Storage Files: Destroy staged files on Google Cloud Storage after loading data. Default is On.
Cloud Storage File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
Recreate Target Table: Choose whether the component recreates its target table before the data load. If Off, the component will use an existing table or create one if it does not exist. Default is On.
Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.

Using Variables with parameters

The API Extract component supports the use of variables with parameters.

Grid Variables in Matillion ETL can now be used for both the Parameter Name and Parameter value in the following properties:

URI Params
Query Params
Header Params

Job variables and environment variables can be used for the Parameter Value (but not Parameter Name) in the following property:

URI Params

Job variables and environment variables can be used for the Parameter Name and/or Parameter Value in the following properties:

Query Params
Header Params

Users, therefore, can set up a job or environment variable in the format ${variableName} in place of the Parameter Value for a URI, Header, or Query parameter within API Extract.

Note

Variable names will not be replaced or converted to display the literal value at validation—instead, the variable name will continue to be displayed in the component. At runtime, the value of the variable (at that time) will be used.

If the default value for the variable is empty, the component will report a validation error.

For more information about variables, please read the following:

Snowflake	Delta Lake on Databricks	Amazon Redshift	Google BigQuery	Azure Synapse Analytics
✅	❌	✅	✅	❌