Volume to Delta Table
The Volume to Delta Table component lets users transfer data from a pre-existing volume in Databricks in to a Delta Lake table without replacing or deleting any existing data.
Warning
If you have selected hive_metastore
as your default catalog, limitations can't be used, volumes can't be created, and the feature won't be available to you. This component only supports Unity Catalog.
Warning
You must use a SQL data warehouse or a cluster running Databricks Runtime 13.3 LTS or above, otherwise unexpected behavior in the component can occur.
Properties
Name
= string
A human-readable name for the component.
Source
= drop-down
The file location to load the data from.
Files in this location must have the specified FILEFORMAT
. Accepted encryption options are TYPE = 'AWS_SSE_C'
, and MASTER_KEY
for AWS S3.
Pattern
= string
Files in the specified location will only be loaded if their names match the pattern you specify here. You can use wildcards in the pattern. Enter .*
to match all files in the location.
Catalog
= drop-down
Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Data Productivity Cloud environment setup. Selecting a catalog will determine which schema (databases) are available in the next parameter.
Warning
If you have selected hive_metastore
as your default catalog, limitations can't be used, volumes can't be created, and the feature won't be available to you. This component only supports Unity Catalog.
Note
You must have appropriate permissions and access rights to the source volume and destination Delta table.
Schema (Database)
= drop-down
The Databricks schema. The special value, [Environment Default], will use the schema defined in the environment. Read Create and manage schemas to learn more.
Table
= string
The name of the Delta table. This table will be recreated and will drop any existing table of the same name.
Load Columns
= dual listbox
Choose the columns to load. If you leave this parameter empty, all columns will be loaded.
File Type
= drop-down
The format of the source files to load. Available file types are CSV,JSON, PARQUET, and AVRO.
Component properties will change to reflect the selected file type. Click one of the tabs below for properties applicable to that file type.
Header
= boolean
Select Yes to use the first line of the file as column names. If not specified, the default is No.
Field Delimiter
= string
Enter the delimiter character used to separate fields in the CSV file. This can be one or more single-byte or multibyte characters that separate fields in an input file. If none is specified, the default is a comma.
Accepted characters include common escape sequences, octal values (prefixed by \\
), or hex values (prefixed by 0x
). This delimiter is limited to a maximum of 20 characters. The specified delimiter must be a valid UTF-8 character and not a random sequence of bytes.
Note
A TAB character can be specified as "\".
Date Format
= string
Manually set a date format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd
.
Timestamp Format
= string
Manually set a timestamp format in the CSV files to be loaded. If none is specified, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX]
.
Encoding Type
= string
The encoding type to use when decoding the CSV files. If none is specified, the default is UTF-8
.
Ignore Leading Whitespace
= boolean
When Yes, skips any leading whitespaces. If not specified, the default is No.
Ignore Trailing Whitespace
= boolean
When Yes, skips any trailing whitespaces. If not specified, the default is No.
Infer Schema
= boolean
If Yes, will attempt to determine the input schema automatically from the data contained in the CSV file. If not specified, the default is No.
Multi Line
= boolean
If Yes, will parse records which may span multiple lines. If not specified, the default is No.
Null Value
= string
Sets the string representation of a null value. If not specified, the default value is an empty string.
Empty Value
= string
Sets the string representation of an empty value. If not specified, the default value is an empty string.
Recursive File Lookup
= boolean
If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.
Force Load
= boolean
If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.
Date Format
= string
Manually set a date format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd
.
Timestamp Format
= string
Manually set a timestamp format in the data files to be loaded. If none is specified, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX]
.
Encoding Type
= string
The encoding type to use when decoding the CSV files. If none is specified, the default is UTF-8
.
Multi Line
= boolean
If Yes, will parse records which may span multiple lines. If not specified, the default is No.
Primitives As String
= boolean
If Yes, primitive data types are interpreted as strings in JSON files. If not specified, the default is No.
Prefers Decimals
= boolean
If Yes, all floating-point values will be treated as a decimal type in JSON files. If the values don't fit in decimal, then they're inferred as doubles. If not specified, the default is No.
Allow Comments
= boolean
If Yes, will allow JAVA/C++ comments in JSON records. If not specified, the default is No.
Allow Unquoted Field Names
= boolean
If Yes, will allow unquoted JSON field names. If not specified, the default is No.
Allow Single Quotes
= boolean
If Yes, will allow single quotes in addition to double quotes in JSON records. If not specified, the default is No.
Allow Numeric Leading Zeros
= boolean
If Yes, will allow leading zeros in numbers in JSON records. For example, 00019
. If not specified, the default is No.
Allow Backslash Escaping Any Character
= boolean
If Yes, will allow the quoting of all characters using the backslash quoting mechanism in JSON records. For example, \
. If not specified, the default is No.
Allow Unquoted Control Chars
= boolean
If Yes, will allow JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters) in JSON records. If not specified, the default is No.
Drop Field If All Null
= boolean
If Yes, will ignore columns that contain only null values or empty arrays/structs in JSON records based on the Schema (Database) you select. If not specified, the default is No.
Recursive File Lookup
= boolean
If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.
Force Load
= boolean
If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.
Merge Schema
= boolean
If Yes, will merge schema from all PARQUET part-files. If not specified, the default is No.
Recursive File Lookup
= boolean
If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.
Force Load
= boolean
If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.
Merge Schema
= boolean
If Yes, will merge schema from all AVRO part-files. If not specified, the default is No.
Recursive File Lookup
= boolean
If Yes, partition inference is disabled. If not specified, the default is No. To control which files are loaded, use the Pattern property instead.
Force Load
= boolean
If Yes, files are loaded regardless of whether they've been loaded before. If not specified, the default is No.
Snowflake | Databricks | Amazon Redshift |
---|---|---|
❌ | ✅ | ❌ |