Vacuum🔗

Vacuum is an orchestration component that performs a vacuum operation on a list of tables. Vacuum is a housekeeping task that physically reorganizes table data according to its sort key, and reclaims space left over from deleted rows. Vacuum is almost always used at the end of an orchestration pipeline.

For more information about the vacuum process, read:

Databricks VACUUM documentation.
AWS VACUUM documentation.

Properties🔗

DatabricksAmazon Redshift

Name = string

A human-readable name for the component.

Catalog = drop-down

Select a Databricks Unity Catalog. The special value [Environment Default] uses the catalog defined in the environment. Selecting a catalog will determine which databases are available in the next parameter.

Schema (Database) = drop-down

The Databricks schema. The special value [Environment Default] uses the schema defined in the environment. Read Create and manage schemas to learn more.

Tables to Vacuum = dual listbox

Select which tables to vacuum.

Retention Period = integer

The retention threshold. The default is 7, with the unit specified in Retention Unit.

Retention Unit = drop-down

Select the unit of the Retention Period. Options are Day, Hour, or Week. The default is Day.

Name = string

A human-readable name for the component.

Schema = drop-down

Select the table schema. The special value [Environment Default] uses the schema defined in the environment. For more information on using multiple schemas, read Schemas.

<!-- param-end:[schema] -->

---

<!-- param-start:[tablesToVacuum] | warehouses: [redshift] -->
`Tables to Vacuum` = _dual listbox_

The tables to vacuum.

Only one vacuum may be running at any one time across an entire Amazon Redshift cluster. Therefore, vacuums may fail due to concurrent workloads. This is usually harmless if the same tables will be vacuumed again on the next run of the pipeline. If this is the case, consider joining the "Failure" link of the component to an [End Success](/data-productivity-cloud/designer/docs/end-success/) component to prevent vacuum failure from failing the whole pipeline.
<!-- param-end:[tablesToVacuum] -->

---

<!-- param-start:[vacuumOption] | warehouses: [redshift] -->
`Vacuum Options` = _drop-down_

The component reclaims disk space occupied by deleted rows in a table, using the method selected here:

- **None:** A default vacuum operation. This is analogous to "FULL" in the current AWS implementation.
 - **FULL:** Is equivalent to DELETE ONLY if the target table is more than 95% sorted, otherwise will perform a full sort.
- **SORT ONLY:** Sorts the table but does not reclaim disk space. Is quick at the expense of unclaimed memory.
- **DELETE ONLY:** Will not sort tables and is consequently quicker than other methods.
- **REINDEX:** Analyzes interleaved sort keys and performs a FULL sort.
<!-- param-end:[vacuumOption] -->

Snowflake	Databricks	Amazon Redshift
❌	✅	✅

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!