Skip to content

Data lineage

The Data Productivity Cloud's lineage feature provides a visual representation of data flow, showcasing the relationships between data objects and transformations. The Data Productivity Cloud enriches the data lineage with valuable metadata. This includes data types, applied transformations, and source system details. This context-rich information empowers users to gain deeper insights into each element of the data flow.

Let's take a look at how this works and what kind of data is stored to make it happen.

Data lineage flow

Data lineage flow

  1. Data originates from various sources like databases, SaaS platforms, and flat files. It first arrives at a temporary landing zone called the staging area.
  2. This data is then transformed using the Matillion Agent. This transformation stage typically involves cleaning the data, ensuring consistent formatting, and even deriving new data points.
  3. Crucially, the metadata repository captures information about this data's origin, any transformations applied, and its final destination. This information serves as the backbone for understanding the data's lineage.
  4. Finally, the transformed data is loaded into the data warehouse, a central repository that acts as the historical data hub for the organization.

This never involves accessing or storing the actual data being transported, only metadata.