Skip to content

Schema drift

Overview

Schema Drift occurs when your data source changes its metadata. On the fly, fields, columns, and types can be added, removed, or changed. If you don't account for schema drift, your data flow becomes vulnerable to changes in the upstream data source.

Data Loader pipelines can easily adapt to these schema changes and propagate them to the destination, without assistance from IT or the need to code.

These "self-healing" pipelines not only save time, they significantly reduce the number of job failures and delays due to schema drift. Users can ensure that data continues to flow, uninterrupted, from source to target.


Schema Drift Support

Most pipelines include Schema Drift support. Schema Drift is what occurs when changes are made to the source data or API settings, thereby creating a difference between the source and your configuration in the pipeline. If these changes are not handled appropriately, the pipeline can fail or process incorrect information. As such, pipelines that have Schema Drift support will handle changes to the source data and continue operating. You may still need to make changes if you notice schema drift occurring. The documentation for each source will state whether that source supports schema drift or not.

Schema Drift support handles the following changes to source data:

  1. Missing columns: Missing columns are loaded as NULL.
  2. Data type changes: See Data Type changes section below for more detail.
  3. Missing tables: Missing tables are not loaded. All other tables will continue to be loaded. You will need to update your pipeline settings to remove the missing tables.
  4. Newly added tables and fields: The pipeline does not automatically add new tables/fields to the data source to your pipeline. The pipeline is not otherwise affected, and you can manually add new tables and fields to your pipeline as required.

Data Type Changes

Data Type changes are accommodated, but if these are not compatible changes for the target cloud platform, the current column will be renamed as column_name, _datetime , and the column re-purposed as the new datatype. The format of the datetime extension is mmddhhmmss , for example, 20210113110334 and will be the same for all columns in the same table in the same Pipeline execution. The new column will be NULL up to the date of change - this should be considered for downstream dependencies such as views, reports, etc.