Vacuum Table
Perform a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort key, and reclaims space left over from deleted rows.
Delta Lake on Databricks: For more information about the vacuum process, read Vacuum.
Amazon Redshift: Vacuum can be a very expensive operation. Depending on your use-case, vacuum may be unnecessary. To learn more, read VACUUM.
When in doubt, vacuum is almost always used at the end of an orchestration job.
Properties
Name
= string
A human-readable name for the component.
Catalog
= drop-down
Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database
= drop-down
Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Tables to Vacuum
= drop-down
Select which tables to vacuum.
Retention Period
= integer
The retention threshold. The default is 7.
Retention Unit
= drop-down
Select the unit of retention. Choose from Day, Hour, Week. The default is Day.
Name
= string
A human-readable name for the component.
Schema
= drop-down
Select the table schema. The special value, [Environment Default], will use the schema defined in the environment. For more information on using multiple schemas, read Schemas.
Tables to Vacuum
= dual listbox
The tables to vacuum.
Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore, vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the job. If this is the case, consider joining the "Failure" link of the component to an "End Success" to prevent vacuum failure from failing the whole job.
Vacuum Options
= drop-down
Reclaims disk space occupied by deleted rows in a table.
- DELETE ONLY: Will not sort tables and is consequently quicker than other methods.
- FULL: Is equivalent to DELETE ONLY if the target table is > 95% sorted, otherwise will perform a full sort.
- None: A default vacuum operation. This is analogous to "FULL" in the current AWS implementation.
- REINDEX: Analyzes interleaved sort keys and performs a FULL sort.
- SORT ONLY: Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
❌ | ✅ | ✅ | ❌ | ❌ |