Lineage

Editions

This feature is for customers on our Enterprise edition only. Visit Matillion pricing to learn more about each edition.

Lineage provides a clear, visual representation of data flow in Data Productivity Cloud pipelines, giving you insights into pipeline dependencies and relationships, and allowing you to answer questions such as:

What was the data source?
How has the data changed through data transformations?
What is the final destination of the data?

Lineage offers several key benefits, including:

Audit and compliance: Quickly track data back to its source for governance and troubleshooting.
Impact Analysis: Understand how upstream data changes affect downstream processes. For more information, read Filtering data lineage.
Faster Debugging: Identify issues at the source when data isn't behaving as expected.

Note

Lineage is available for both orchestration and transformation pipelines, giving you a comprehensive view of where your data originates. Currently, only select connectors are supported, with additional connectors to be included in the future.

Video example

Expand this box to watch our video about data lineage.

Video

Accessing data lineage

Lineage is collected for each dataset used in your pipelines. To access lineage:

Log in to the Data Productivity Cloud.
In the left navigation, click the Activity icon . Then, select Lineage from the menu.

Lineage shows each table in your project. Listed are table names, the cloud data warehouse infrastructure (for example, Snowflake), and the cloud data warehouse location of each table.

Click a dataset's Table name to drill down to see a lineage graph for that dataset.

Note

If you have previously used this feature, a list of tables will be displayed. Choose one to see lineage information.
The Lineage viewer isn't a full schema view of your cloud data warehouse; it shows only those datasets that you have used in running pipelines.

Using lineage

The lineage for a dataset is visually represented on a canvas in a diagram called a lineage graph. This graph depicts the various states the data undergoes as it moves through a transformation pipeline. The following example illustrates what you can expect to see in a transformation pipeline:

Lineage graph

In this example, we can see that two datasets have been combined through some transformation pipeline (denoted by the T icon) to produce a target dataset. Each dataset is represented by a separate box on the canvas, and the data flow is left to right, following the direction of the arrows. The lineage graph may contain multiple datasets and multiple transformations that act on those datasets.

On this canvas, you can perform the following actions:

Zoom in and out using the controls at the bottom-right.
Drag the canvas around the window with your mouse.
Drag individual boxes (datasets) around the canvas to reorganise the view (the relationships between the datasets will remain unchanged).
Click a transformation icon, T, to open a panel giving you the following details:
- Name of the pipeline.
- Name of the project containing the pipeline.
- Status of the most recent pipeline run (SUCCESS, FAILURE).
- Date and time that the most recent pipeline run started and finished.
- Approximate duration of the most recent pipeline run, in seconds.
- The name of the user who most recently ran the pipeline.
Click the pipeline name at the top of this panel to go to the Pipeline run details page in the Observability dashboard.
Click any dataset box to show information about that dataset in a panel on the right. This panel includes a Columns tab which shows the name and data type of every column in the dataset.
Click the down arrow in any of the dataset boxes to expand the box, displaying every column in the dataset.

With one or more dataset boxes expanded on the canvas, you can trace the full lineage of any individual column of data. To trace a specific column, click the column name in any of the datasets. Arrows will trace that column between all the datasets in the lineage. The following example shows a simple case of this, showing the ID column in two datasets before and after a transformation.

Lineage columns

Filtering data lineage

Filtering data lineage views enhances the clarity of data flows in your pipelines. By applying filters to large datasets and pipelines, you can gain insights into the lineage without needing to view or load everything on the canvas.

What can you use lineage for?

Upstream lineage: Quickly trace the origins of your data to perform root cause analysis, and understand how the dataset you're analyzing is constructed.
Downstream lineage: Perform impact analysis to see which datasets or columns will be affected if you make a significant change.

To filter your lineage, use the drop-down menu next to Filter view located in the top-left corner of the canvas:

Filter name	Description
Default	The lineage view is set to Default, where the critical path for your selected dataset is displayed, giving you the essential view of its lineage.
Upstream	This view displays relevant data and pipelines upstream of your dataset.
Downstream	This view displays relevant data and pipelines downstream of your dataset.
Complete	This view displays everything relevant to your dataset, both upstream and downstream.

Note

You can only select one filter at a time.

Data access control

Lineage metadata is collected and aggregated at the Matillion account level, so you have a unified view of lineage across your projects. However, to preserve data security, project-level user permissions control which users can see which metadata. Specifically, if a dataset has been read from, or written to, by any project that a given user belongs to, then that user will be able to see the dataset metadata, but otherwise the user won't have access.