Skip to content

Optimize

Optimize the layout of Delta Lake data with the Optimize component. You can optionally optimize a subset of data or colocate data by column. If you don't specify colocation, bin-packing optimization is performed.

Bin-packing optimization is idempotent. This means that if the operation is run twice on the same dataset, the second run has no effect. Bin-packing aims to produce evenly balanced data files with respect to their size on disk, but not necessarily the number of tuples per file. Typically, however, the two measures are often correlated.

Z-Ordering is not idempotent. However, Z-Ordering does aim to be an incremental operation. The time taken for Z-Ordering isn't guaranteed to reduce over multiple runs. Z-Ordering aims to produce evenly balanced data files with respect to the number of tuples, but not necessarily data size on disk. While the two measures are often correlated, situations can occur where this is not the case, leading to skews in optimisation times for tasks.


Properties

Name = string

A human-readable name for the component.


Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the environment setup. Selecting a catalog will determine which databases are available in the next parameter.


Database = drop-down

Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the environment setup.


Table = drop-down

The Delta Lake table to be optimized. Only one table can be selected per instance of the component.


Partition = expression editor

The partition columns to include in the optimization process with the related condition. The default is none.


Z Order = column editor

The columns to include in the optimization process. This list should exclude any partition columns. The default is none.


Snowflake Databricks Amazon Redshift