Skip to content

Google Drive Table

Overview

Create a table that references data stored in Google Drive. New to Google Drive? Read Introduction to Google Drive API.

Your Matillion ETL instance must have access to this Google Drive data. That is to say that the GCP credentials used in Matillion ETL must be from an account that has the right to at least view the data. Publicly viewable Google sheets can be accessed and used by Matillion ETL, regardless of credentials.

Should a sheet be specified that Matillion ETL does not have proper access to, this component may still successfully run, as it is merely referencing the data in a table. However, when the table itself is used (e.g. sampled), an error may then occur when the referenced data cannot be accessed.

It is essential that your GCP credentials have the Google Drive API enabled. Read Google Query Authentication Guide for details. Due to limitations of how Google BigQuery communicates with Google Drive, it may be necessary to grant additional API scope—even beyond Allow full access to all Cloud APIs. To do this, use the following command in the Google BigQuery console:

gcloud beta compute instances set-scopes <your instance> --service-account <your-account@your-project.iam.gserviceaccount.com> --scopes
https://www.googleapis.com/auth/cloud-platform
https://www.googleapis.com/auth/drive

Note

Shared links using the type=anyone or type=domain options (created in 2017 or earlier) may no longer function. These links can be amended by creating a new shared link in Google Drive. To learn more, read Access link-shared files using resource keys.


Properties

Name = string

A human-readable name for the component.


Project = drop-down

Select the Google Cloud project. The special value, [Environment Default], will use the project defined in the environment. For more information, read Creating and managing projects.


Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment. For more information, read Introduction to datasets.


New Table Name = string

A name for the table.


Table Metadata = column editor

  • Column Name: The name of the new column.
  • Data Type: Google data types.
  • Define Nested Metadata: When the Define Nested Metadata checkbox is ticked inside the Table Metadata property, a tree-structure can be defined for metadata.
  • Mode: The field mode. Default is 'NULLABLE'.
    • NULLABLE: Field allows null values.
    • REQUIRED: Field does not accept null values.
    • REPEATED: Field can accept multiple values.

Create/Replace = drop-down

  • Create: The default option, creates a new table. This will generate an error if a table with the same name already exists, but will never destroy existing data.
  • Create if not exists: This will only create a new table if a table of the same name does not already exist. It will not destroy existing data. If the schema of the existing table does not match the schema defined in this component, no attempt is made to fix or correct it, which could lead to errors later in the job if you did not expect an existing table to exist, or to have a different schema to the one defined in this component.
  • Replace: This drops any existing table of the same name, and then creates a new table. This guarantees that after the component succeeds, the table matches the schema defined in this component. However, any existing data in an existing table will be lost.

Since other database objects may depend upon this table, drop ... cascade is used, which may actually remove many other database objects.


Google Drive URL Location = string

The URL of the Google Drive file. This follows the format https://drive.google.com/open?id=(fileid).


Compression = drop-down

Whether the input file is compressed in gzip format or not compressed at all.


File Format = drop-down

  • Google Sheets
  • CSV
  • JSON (new line delimited): this requires an additional "JSON Format".

Number of Errors Allowed = integer

The maximum number of individual parsing errors that cause the whole load to fail. Values up to this will be substituted as null values. This value defaults to 0.


Ignore Unknown Values = drop-down

  • Yes: Accept rows that contain values that do not match the schema. Unknown values are ignored. Will ignore extra values at the end of a line for CSV files.
  • No: Omit any rows with invalid values.

Delimiter = string

The delimiter that separates columns. The default is a comma ,. A [TAB] character can be specified as "\ ".


CSV Quoter = string

Specifies the character to be used as the quote character when using the CSV option.


Encoding = drop-down

The encoding the data is in. This defaults to UTF-8.


Header Rows To Skip = string

The number of rows at the top of the file to ignore. Defaults to 0.


Allow Quoted Newlines = drop-down

  • Yes: Allow a CSV value to contain a newline character when the value is encased in quotation marks.
  • No: A newline character, regardless of quotations, is always considered a new row.

Allow Jagged Rows = drop-down

  • Yes: Missing values are treated as NULL but accepted.
  • No: Rows with missing data are treated as bad records.

A bad record will count toward the 'Maximum Errors' count.


Variable exports

This component makes the following values available to export into variables:

Source Description
Time taken to stage The amount of time (in seconds) taken to fetch the data from the data source and upload it to cloud storage.
Time taken to load The amount of time (in seconds) taken to execute the COPY statement to load the data into the target table from cloud storage.

Strategy

Connect to the target database and issue the query. Stream the results into objects in cloud storage. Next, create or truncate the target table and issue a COPY command to load the cloud storage objects into the table. Finally, clean up the temporary cloud storage objects.


Sharing with Service Accounts in Google Drive

If you wish to share your Google Sheets objects or other Google Drive objects with Matillion ETL without making the Sheets or other Drive objects publicly available, you can specify the service account that Matillion ETL runs under when sharing.

To do this, follow these steps:

  1. Log in to the Google Cloud Console.
  2. Click the navigation menu button, and then scroll down to and click Compute Engine.
  3. Within the Compute Engine menu, click VM instances, which is located under the heading VIRTUAL MACHINES.
  4. Click on the name of an existing VM instance.
  5. Within the VM instance details dialog, click EDIT.
  6. Scroll down to Service account and select the service account to share the Google Sheet or other Google Drive object with. Copy this account address.
  7. Navigate to Google Sheets/Google Drive, then click into a Sheet or other object, and click the Share button. Paste the service account into the text box in the Share with people and groups dialog, and then click Done.

Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics