Skip to content

API v1 - Metadata

The Matillion ETL Metadata API lets you determine where data in Matillion ETL jobs has originated from. SQL queries that update data (INSERT, UPDATE, DELETE) or that alter database structures (for example, when creating a table) are logged in a single database table, along with metadata pertinent to the orchestration or transformation job, as well as task information for the component.

This endpoint allows you to retrieve the SQL queries executed by a specific project against a specific environment. SQL queries that execute DDL or DML operations as part of a job run will be logged, as well as SELECT queries run on source systems as part of data ingestion.

This information can be used to determine the lineage of data staged by Matillion ETL and how this is transformed into new target tables in your cloud data warehouse.

Credentials on COPY SQL statements will be redacted before being logged.

Entries older than seven days are removed daily from the table.

Note

Please note the following product information:

  • This endpoint can only be accessed on Matillion ETL version 1.54.7 or higher.
  • This endpoint is an Enterprise Mode feature.
  • This endpoint is currently only available for Matillion ETL for Snowflake, Delta Lake on Databricks, and Amazon Redshift.

Record field explanations

Below is an example JSON block returned from the server when the endpoint is called. Beneath the JSON block is an explanation of each field.

{
    "taskType": "Run",
    "loggedTimestamp": 1619017102402,
    "startTimestamp": 1619017086243,
    "endTimestamp": 1619017102383,
    "urlPath": "/#test/test/default/lin_orch",
    "componentName": "Rewrite Table",
    "componentCanvasName": "Rewrite Table 0",
    "source": [
        {
            "url": "matillion.com-1",
            "fqn" : "db_test.schema.text.fct_table_test"
        }
    ],
    "target": [
        {
            "url": "matillion.com-1",
            "fqn" : "db_test.schema.text.fct_table_test"
        }
    ],
    "sourceConnectionUrl": "Not Applicable",
    "sourceUsername": "Not Applicable",
    "environment": {
        "name": "test",
        "url": "matillion",
        "schema": "PUBLIC",
        "database": "PRODUCT",
        "user": "username",
        "connectionOptions": {}
    },
    "job": {
        "name": "test",
        "type": "TRANSFORMATION"
    },
    "sql": [
        {
            "query": "CREATE TABLE \"FROSTY_BORG\".\"PUBLIC\".\"TEST_NEW_2\" AS SELECT \n  * \nFROM (SELECT \n  \"a\" \nFROM \"FROSTY_BORG\".\"PUBLIC\".\"test\") \nORDER BY \"a\" DESC",
            "dialect": "Snowflake"
        }
    ]
},
Field name Data type Description
taskType String Describes how the component was run, using the display name in the DatabaseTaskBatchType enumeration. For example, "Run", "Schedule", "API".
loggedTimestamp Number The time in milliseconds (ms) denoting when the line was logged.
startTimestamp Number Value in milliseconds (ms) denoting when the component started a run.
endTimestamp Number Value in milliseconds (ms) denoting when the component completed a run.
urlPath String The path appended to the host. Used for accessing the orchestration or transformation job containing this component.
componentName String The name of the underlying Matillion ETL component. For example, "Truncate Table".
componentCanvasName String The name of the component as defined by the user in the component's Name property.
Source Array Includes the source URL (url) and fully qualified name (fqn) of the data source, to provide lineage information about Matillion ETL orchestration and transformation jobs.
Target Array Includes the target URL (url) and fully qualified name (fqn) of the data target, to provide lineage information about Matillion ETL orchestration and transformation jobs.
sourceConnectionUrl String The connection URL configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
sourceUsername String The username for the data connection configured on certain components. This parameter is restricted to a select few components, and will have a value of "Not Applicable" elsewhere.
environment Object Details of the Matillion ETL environment used to execute the job, such as which Cloud Data Warehouse (CDW) database and schema are the defaults.
job Object A block of JSON data containing information about the Matillion ETL job, including name and type ("ORCHESTRATION" or "TRANSFORMATION").
SQL Array All of the SQL related to the running of the component, presented in tuples and including the SQL dialect.

Lineage log API endpoints

API base URI

http(s)://<host>/rest/v1/group/name/<projectGroupName>/project/name/<projectName>/environment/name/<environmentName>

API endpoint and function

The Lineage API endpoint is available on standard RESTful APIs that use HTTP or HTTPS requests to GET data. The Lineage API service is accessed via the Uniform Resource Identifier (URI). All reference below will assume the API Base URI has been specified.

The available API endpoint is listed below.

Method Path URI Function
GET /lineage /lineage?jobType=<type>&startTimestamp=<value>&endTimestamp=<value> Get a lineage log of each component in a Matillion ETL job.

Full endpoint example:

https://host.com/rest/v1/group/name/ProjectGroupName/project/name/ProjectName/environment/name/EnvironmentName/lineage?jobType=orchestration&startTimestamp=1619017086243&endTimestamp=1619017102383

API parameters and description

Below you will find the endpoint parameters and a description of each.

Parameter Description Condition
jobType Filters the type of job to be returned in the response. jobType="orchestration" returns only orchestration jobs; jobType="transformation" returns only transformation jobs. Optional
startTimestamp The earliest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. Required
endTimestamp The latest possible logged time for records to be returned. Takes the format of UTC time in milliseconds. Optional

When run, the endpoint returns a list, in ascending order, of records logged after (and including) the startTimestamp parameter, and before (and including) the endTimestamp parameter. If no endTimestamp parameter is included, all records from the startTimestamp parameter to the current time are included. The list will be filtered by job type (orchestration or transformation) if the optional jobType parameter is included.

By default, no more than 30,000 records can be returned per call. However, this limit can be modified if required. Please contact support if you need to modify this limit.