Vacuum Table🔗

Perform a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort key, and reclaims space left over from deleted rows.

Delta Lake on Databricks: For more information about the vacuum process, read Vacuum.

Amazon Redshift: Vacuum can be a very expensive operation. Depending on your use-case, vacuum may be unnecessary. To learn more, read VACUUM.

When in doubt, vacuum is almost always used at the end of an orchestration job.

Properties🔗

Delta Lake on DatabricksAmazon Redshift

Name = string

A human-readable name for the component.

Catalog = drop-down

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Database = drop-down

Select the Delta Lake database. The special value [Environment Default] uses the database defined in the environment.

Tables to Vacuum = drop-down

Select which tables to vacuum.

Retention Period = integer

The retention threshold. The default is 7.

Retention Unit = drop-down

Select the unit of retention. Choose from Day, Hour, Week. The default is Day.

Name = string

A human-readable name for the component.

Schema = drop-down

Select the table schema. The special value [Environment Default] uses the schema defined in the environment. For more information on using multiple schemas, read Schemas.

Tables to Vacuum = dual listbox

The tables to vacuum.

Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore, vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the job. If this is the case, consider joining the "Failure" link of the component to an "End Success" to prevent vacuum failure from failing the whole job.

Vacuum Options = drop-down

Reclaims disk space occupied by deleted rows in a table.

DELETE ONLY: Will not sort tables and is consequently quicker than other methods.
FULL: Is equivalent to DELETE ONLY if the target table is > 95% sorted, otherwise will perform a full sort.
None: A default vacuum operation. This is analogous to "FULL" in the current AWS implementation.
REINDEX: Analyzes interleaved sort keys and performs a FULL sort.
SORT ONLY: Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.

Snowflake	Delta Lake on Databricks	Amazon Redshift	Google BigQuery	Azure Synapse Analytics
❌	✅	✅	❌	❌