Azure Blob Storage Load (Delta Lake)
The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.
Azure Blob Storage is used for storing large amounts of unstructured object data, for example as text or binary data.
To learn more, read Blob storage.
Properties
Name
= string
A human-readable name for the component.
Storage Account
= drop-down
Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.
Blob Container
= drop-down
A Blob Storage location. The available blob containers will depend on the selected storage account.
Pattern
= string
A string that will partially match all filenames that are to be included in the load. Defaults to . indicating all* files within the Azure Storage Location.
Catalog
= drop-down
Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.
Database
= drop-down
Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.
Target Table
= string
Select the table into which data will be loaded from Azure Blob storage.
Load Columns
dual listbox
Select which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.
Recursive File Lookup
drop-down
When enabled, disables partition inference. To control which files are loaded use the "pattern" property instead.
File Type
drop-down
Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.
Skip Header
drop-down
(CSV only) When True, uses the first line as names of columns. Default is False.
Field Delimiter
= string
(CSV only) Specify a delimiter to separate columns. The default is a comma ,.
A TAB character can be specified as "\ ".
Date Format
= string
(CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd
.
Timestamp Format
= string
(CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX]
.
Encoding Type
= string
(CSV & JSON only) Decodes the CSV files via the given encoding type. If none is set, the default is UTF-8
.
Mode
drop-down
Select the mode for handling corrupted records during parsing.
- DROPMALFORMED: ignores corrupted records.
- FAILFAST: throws an exception when it meets corrupted records.
- PERMISSIVE: when a corrupted record is met, the malformed string is placed into a field configured by
columnNameOfCorruptRecord
, and the malformed field is set to null. This is the default setting.
Ignore Leading White Space
drop-down
(CSV only) When True, skips any leading whitespaces. Default is False.
Ignore Trailing White Space
drop-down
(CSV only) When True, skips any trailing whitespaces. Default is False.
Infer Schema
drop-down
(CSV only) When True, infers the input schema automatically from the data. Default is False.
Multi Line
drop-down
When True, parses records, which may span multiple lines. Default is False.
Null Value
string
(CSV only) Sets the string representation of a null value. The default value is an empty string.
Empty Value
= string
(CSV only) Sets the string representation of an empty value. The default value is an empty string.
Primitive as String
drop-down
(JSON only) When True, primitive data types are inferred as strings. Default is False.
Prefers Decimal
drop-down
(JSON only) When True, infers all floating-point values as a decimal type. If the values do not fit in decimal, then they are inferred as doubles. Default is False.
Allow Comments
drop-down
(JSON only) When True, allows JAVA/C++ comments in JSON records. Default is False.
Allow Unquoted Field Names
drop-down
(JSON only) When True, allows unquoted JSON field names. Default is False.
Allow Single Quotes
drop-down
(JSON only) When True, allows single quotes in addition to double quotes. Default is True.
Allow Numeric Leading Zeros
drop-down
(JSON only) When True, allows leading zeros in numbers, e.g. 00019
. Default is False.
Allow Backslash Escaping Any Character
drop-down
(JSON only) When True, allows accepting the quoting of all characters using the backslash quoting mechanism \
. Default is False.
Allow Unquoted Control Chars
drop-down
(JSON only) When True, allows JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters). Default is False.
Drop Field If All Null
drop-down
(JSON only) When True, ignores column of all null values or empty arrays/structs during the schema inference. Default is False.
Merge Schema
drop-down
(AVRO, PARQUET only) When True, merges schemata from all Parquet part-files. Default is False.
Path Glob Filter
string
An optional glob pattern, used to only include files with paths matching the pattern.
Force Load
drop-down
When True, idempotency is disabled and files are loaded regardless of whether they have been loaded before. Default is False.
Snowflake | Delta Lake on Databricks | Amazon Redshift | Google BigQuery | Azure Synapse Analytics |
---|---|---|---|---|
❌ | ✅ | ❌ | ❌ | ❌ |