API v1 - Metadata
The Matillion ETL Metadata API lets you determine where data in Matillion ETL jobs has originated from. SQL queries that update data (INSERT, UPDATE, DELETE) or that alter database structures (for example, when creating a table) are logged in a single database table, along with metadata pertinent to the orchestration or transformation job, as well as task information for the component.
This endpoint allows you to retrieve the SQL queries executed by a specific project against a specific environment. SQL queries that execute DDL or DML operations as part of a job run will be logged, as well as SELECT queries run on source systems as part of data ingestion.
This information can be used to determine the lineage of data staged by Matillion ETL and how this is transformed into new target tables in your cloud data warehouse.
Credentials on COPY SQL statements will be redacted before being logged.
Entries older than seven days are removed daily from the table.
Note
Please note the following product information:
- This endpoint can only be accessed on Matillion ETL version 1.54.7 or higher.
- This endpoint is an Enterprise Mode feature.
- This endpoint is currently only available for Matillion ETL for Snowflake, Delta Lake on Databricks, and Amazon Redshift.
Record field explanations
Below is an example JSON block returned from the server when the endpoint is called. Beneath the JSON block is an explanation of each field.
{
"taskType": "Run",
"loggedTimestamp": 1619017102402,
"startTimestamp": 1619017086243,
"endTimestamp": 1619017102383,
"urlPath": "/#test/test/default/lin_orch",
"componentName": "Rewrite Table",
"componentCanvasName": "Rewrite Table 0",
"source": [
{
"url": "matillion.com-1",
"fqn" : "db_test.schema.text.fct_table_test"
}
],
"target": [
{
"url": "matillion.com-1",
"fqn" : "db_test.schema.text.fct_table_test"
}
],
"sourceConnectionUrl": "Not Applicable",
"sourceUsername": "Not Applicable",
"environment": {
"name": "test",
"url": "matillion",
"schema": "PUBLIC",
"database": "PRODUCT",
"user": "username",
"connectionOptions": {}
},
"job": {
"name": "test",
"type": "TRANSFORMATION"
},
"sql": [
{
"query": "CREATE TABLE \"FROSTY_BORG\".\"PUBLIC\".\"TEST_NEW_2\" AS SELECT \n * \nFROM (SELECT \n \"a\" \nFROM \"FROSTY_BORG\".\"PUBLIC\".\"test\") \nORDER BY \"a\" DESC",
"dialect": "Snowflake"
}
]
},
Field name | Data type | Description |
---|---|---|
taskType | String | Describes how the component was run, using the display name in the DatabaseTaskBatchType enumeration. For example, "Run", "Schedule", "API". |
loggedTimestamp | Number | The time in milliseconds (ms) denoting when the line was logged. |
startTimestamp | Number | Value in milliseconds (ms) denoting when the component started a run. |
endTimestamp | Number | Value in milliseconds (ms) denoting when the component completed a run. |
urlPath | String | The path appended to the host. Used for accessing the orchestration or transformation job containing this component. |
componentName | String | The name of the underlying Matillion ETL component. For example, "Truncate Table". |
componentCanvasName | String | The name of the component as defined by the user in the component's Name property. |
Source | Array | Includes the source URL (url ) and fully qualified name (fqn ) of the data source, to provide lineage information about Matillion ETL orchestration and transformation jobs. |
Target | Array | Includes the target URL (url ) and fully qualified name (fqn ) of the data target, to provide lineage information about Matillion ETL orchestration and transformation jobs. |
sourceConnectionUrl | String | The connection URL configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere. |
sourceUsername | String | The username for the data connection configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere. |
environment | Object | Details of the Matillion ETL environment used to execute the job, such as which Cloud Data Warehouse (CDW) database and schema are the defaults. |
job | Object | A block of JSON data containing information about the Matillion ETL job, including name and type ("ORCHESTRATION" or "TRANSFORMATION"). |
SQL | Array | All of the SQL related to the running of the component, presented in tuples and including the SQL dialect . |
Lineage log API endpoints
API base URI
http(s)://<host>/rest/v1/group/name/<projectGroupName>/project/name/<projectName>/environment/name/<environmentName>
API endpoint and function
The Lineage API endpoint is available on standard RESTful APIs that use HTTP or HTTPS requests to GET data. The Lineage API service is accessed via the Uniform Resource Identifier (URI). All reference below will assume the API Base URI has been specified.
The available API endpoint is listed below.
Method | Path | URI | Function |
---|---|---|---|
GET | /lineage | /lineage?jobType=<type>&startTimestamp=<value>&endTimestamp=<value> |
Get a lineage log of each component in a Matillion ETL job. |
Full endpoint example:
https://host.com/rest/v1/group/name/ProjectGroupName/project/name/ProjectName/environment/name/EnvironmentName/lineage?jobType=orchestration&startTimestamp=1619017086243&endTimestamp=1619017102383
API parameters and description
Below you will find the endpoint parameters and a description of each.
Parameter | Description | Condition |
---|---|---|
jobType | Filters the type of job to be returned in the response. jobType="orchestration" returns only orchestration jobs; jobType="transformation" returns only transformation jobs. |
Optional |
startTimestamp | The earliest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. | Required |
endTimestamp | The latest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. | Optional |
When run, the endpoint returns a list, in ascending order, of records logged after (and including) the startTimestamp
parameter, and before (and including) the endTimestamp
parameter. If no endTimestamp
parameter is included, all records from the startTimestamp
parameter to the current time are included. The list will be filtered by job type (orchestration or transformation) if the optional jobType
parameter is included.
By default, no more than 30,000 records can be returned per call. However, this limit can be modified if required. Please contact support if you need to modify this limit.