Skip to content

Azure Blob Storage Load (Delta Lake)

The Azure Blob Storage Load component lets users load data into an existing table from objects stored in Azure Blob Storage.

Azure Blob Storage is used for storing large amounts of unstructured object data, for example as text or binary data.

To learn more, read Blob storage.


Properties

Name = string

A human-readable name for the component.


Storage Account = drop-down

Select an Azure Blob Storage account. An Azure storage account contains all of your Azure Storage data objects: blobs, files, queues, tables, and disks. For more information, read Storage account overview.


Blob Container = drop-down

A Blob Storage location. The available blob containers will depend on the selected storage account.


Pattern = string

A string that will partially match all filenames that are to be included in the load. Defaults to . indicating all* files within the Azure Storage Location.


Catalog = drop-down

Select a Databricks Unity Catalog. The special value, [Environment Default], will use the catalog specified in the Matillion ETL environment setup. Selecting a catalog will determine which databases are available in the next parameter.


Database = drop-down

Select the Delta Lake database. The special value, [Environment Default], will use the database specified in the Matillion ETL environment setup.


Target Table = string

Select the table into which data will be loaded from Azure Blob storage.


Load Columns dual listbox

Select which of the target table's columns to load. Move columns to the right using the arrow buttons to include them in the load. Columns on the left will be excluded from the load.


Recursive File Lookup drop-down

When enabled, disables partition inference. To control which files are loaded use the "pattern" property instead.


File Type drop-down

Select the file type. Available types include AVRO, CSV, JSON, and PARQUET. Below properties will change to reflect the selected file type.


Skip Header drop-down

(CSV only) When True, uses the first line as names of columns. Default is False.


Field Delimiter = string

(CSV only) Specify a delimiter to separate columns. The default is a comma ,.

A TAB character can be specified as "\ ".


Date Format = string

(CSV & JSON only) Manually set a date format. If none is set, the default is yyyy-MM-dd.


Timestamp Format = string

(CSV & JSON only) Manually set a timestamp format. If none is set, the default is yyyy-MM-dd'T'HH:mm:ss.[SSS][XXX].


Encoding Type = string

(CSV & JSON only) Decodes the CSV files via the given encoding type. If none is set, the default is UTF-8.


Mode drop-down

Select the mode for handling corrupted records during parsing.

  • DROPMALFORMED: ignores corrupted records.
  • FAILFAST: throws an exception when it meets corrupted records.
  • PERMISSIVE: when a corrupted record is met, the malformed string is placed into a field configured by columnNameOfCorruptRecord, and the malformed field is set to null. This is the default setting.

Ignore Leading White Space drop-down

(CSV only) When True, skips any leading whitespaces. Default is False.


Ignore Trailing White Space drop-down

(CSV only) When True, skips any trailing whitespaces. Default is False.


Infer Schema drop-down

(CSV only) When True, infers the input schema automatically from the data. Default is False.


Multi Line drop-down

When True, parses records, which may span multiple lines. Default is False.


Null Value string

(CSV only) Sets the string representation of a null value. The default value is an empty string.


Empty Value = string

(CSV only) Sets the string representation of an empty value. The default value is an empty string.


Primitive as String drop-down

(JSON only) When True, primitive data types are inferred as strings. Default is False.


Prefers Decimal drop-down

(JSON only) When True, infers all floating-point values as a decimal type. If the values do not fit in decimal, then they are inferred as doubles. Default is False.


Allow Comments drop-down

(JSON only) When True, allows JAVA/C++ comments in JSON records. Default is False.


Allow Unquoted Field Names drop-down

(JSON only) When True, allows unquoted JSON field names. Default is False.


Allow Single Quotes drop-down

(JSON only) When True, allows single quotes in addition to double quotes. Default is True.


Allow Numeric Leading Zeros drop-down

(JSON only) When True, allows leading zeros in numbers, e.g. 00019. Default is False.


Allow Backslash Escaping Any Character drop-down

(JSON only) When True, allows accepting the quoting of all characters using the backslash quoting mechanism \. Default is False.


Allow Unquoted Control Chars drop-down

(JSON only) When True, allows JSON strings to include unquoted control characters (ASCII characters where their value is less than 32, including Tab and line feed characters). Default is False.


Drop Field If All Null drop-down

(JSON only) When True, ignores column of all null values or empty arrays/structs during the schema inference. Default is False.


Merge Schema drop-down

(AVRO, PARQUET only) When True, merges schemata from all Parquet part-files. Default is False.


Path Glob Filter string

An optional glob pattern, used to only include files with paths matching the pattern.


Force Load drop-down

When True, idempotency is disabled and files are loaded regardless of whether they have been loaded before. Default is False.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics