Skip to content

Volume to Delta Table

The Volume to Delta Table component lets users transfer data from a pre-existing volume in Databricks in to a Delta Lake table without replacing or deleting any existing data.

Warning

If you have selected hive_metastore as your default catalog, limitations can't be used, volumes can't be created, and the feature won't be available to you. This component only supports Unity Catalog.

Warning

You must use a SQL data warehouse or a cluster running Databricks Runtime 13.3 LTS or above, otherwise unexpected behavior in the component can occur.


Properties

Name = string

A human-readable name for the component.


Source = drop-down

The file location to load the data from.

Files in this location must have the specified FILEFORMAT. Accepted encryption options are TYPE = 'AWS_SSE_C', and MASTER_KEY for AWS S3.


Pattern = string

Files in the specified location will only be loaded if their names match the pattern you specify here. You can use wildcards in the pattern. Enter .* to match all files in the location.


Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Data Productivity Cloud environment setup. Selecting a catalog will determine which schema (databases) are available in the next parameter.

Warning

If you have selected hive_metastore as your default catalog, limitations can't be used, volumes can't be created, and the feature won't be available to you. This component only supports Unity Catalog.

Note

You must have appropriate permissions and access rights to the source volume and destination Delta table.


Schema (Database) = drop-down

The Databricks schema. The special value, [Environment Default], will use the schema defined in the environment. Read Create and manage schemas to learn more.


Table = string

The name of the Delta table. This table will be recreated and will drop any existing table of the same name.


Load Columns = dual listbox

Choose the columns to load. If you leave this parameter empty, all columns will be loaded.


File Type = drop-down

The format of the source files to load. Available file types are CSV,JSON, PARQUET, and AVRO.

Component properties will change to reflect the selected file type. Click one of the tabs below for properties applicable to that file type.

Header = boolean

Select Yes to use the first line of the file as column names. If not specified, the default is No.


Field Delimiter = string

Enter the delimiter character used to separate fields in the CSV file. This can be one or more single-byte or multibyte characters that separate fields in an input file. If none is specified, the default is a comma.

Accepted characters include common escape sequences, octal values (prefixed by \\), or hex values (prefixed by 0x). This delimiter is limited to a maximum of 20 characters. The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.

Note

A TAB character can be specified as "\".


Date Format = string

Manually set a date format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd.


Timestamp Format = string

Manually set a timestamp format in the CSV files to be loaded. If none is specified, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].


Encoding Type = string

The encoding type to use when decoding the CSV files. If none is specified, the default is UTF-8.


Ignore Leading Whitespace = boolean

When Yes, skips any leading whitespaces. If not specified, the default is No.


Ignore Trailing Whitespace = boolean

When Yes, skips any trailing whitespaces. If not specified, the default is No.


Infer Schema = boolean

If Yes, will attempt to determine the input schema automatically from the data contained in the CSV file. If not specified, the default is No.


Multi Line = boolean

If Yes, will parse records which may span multiple lines. If not specified, the default is No.


Null Value = string

Sets the string representation of a null value. If not specified, the default value is an empty string.


Empty Value = string

Sets the string representation of an empty value. If not specified, the default value is an empty string.


Recursive File Lookup = boolean

If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.


Force Load = boolean

If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.

Date Format = string

Manually set a date format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd.


Timestamp Format = string

Manually set a timestamp format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].


Encoding Type = string

The encoding type to use when decoding the CSV files. If none is specified, the default is UTF-8.


Multi Line = boolean

If Yes, will parse records which may span multiple lines. If not specified, the default is No.


Primitives As String = boolean

If Yes, primitive data types are interpreted as strings in JSON files. If not specified, the default is No.


Prefers Decimals = boolean

If Yes, all floating-point values will be treated as a decimal type in JSON files. If the values don't fit in decimal, then they're inferred as doubles. If not specified, the default is No.


Allow Comments = boolean

If Yes, will allow JAVA/C++ comments in JSON records. If not specified, the default is No.


Allow Unquoted Field Names = boolean

If Yes, will allow unquoted JSON field names. If not specified, the default is No.


Allow Single Quotes = boolean

If Yes, will allow single quotes in addition to double quotes in JSON records. If not specified, the default is No.


Allow Numeric Leading Zeros = boolean

If Yes, will allow leading zeros in numbers in JSON records. For example, 00019. If not specified, the default is No.


Allow Backslash Escaping Any Character = boolean

If Yes, will allow the quoting of all characters using the backslash quoting mechanism in JSON records. For example, \. If not specified, the default is No.


Allow Unquoted Control Chars = boolean

If Yes, will allow JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters) in JSON records. If not specified, the default is No.


Drop Field If All Null = boolean

If Yes, will ignore columns that contain only null values or empty arrays/structs in JSON records based on the Schema (Database) you select. If not specified, the default is No.


Recursive File Lookup = boolean

If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.


Force Load = boolean

If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.

Merge Schema = boolean

If Yes, will merge schema from all PARQUET part-files. If not specified, the default is No.


Recursive File Lookup = boolean

If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.


Force Load = boolean

If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.

Merge Schema = boolean

If Yes, will merge schema from all AVRO part-files. If not specified, the default is No.


Recursive File Lookup = boolean

If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.


Force Load = boolean

If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.


Snowflake Databricks Amazon Redshift