Skip to content

Google BigQuery

This page describes how to configure a Google BigQuery data source. With Data Loader, you can replicate and load your source data into your target destination.

Tables with defined primary key columns will incrementally load data into the target table when the pipeline is configured. The target table will be truncated and loaded for each run if no primary key columns have been defined for the table.

Schema Drift Support: Yes. Read Schema Drift to learn more.

Return to any page of this wizard by clicking Previous.

Click X in the upper-right of the UI and then click Yes, discard to close the pipeline creation wizard.


Prerequisites

  • Read the Allowed IP addresses topic before you begin. You may not be able to connect to certain data sources without first allowing the Batch IP addresses. In these circumstances, connection tests will always fail and you will not be able to complete the pipeline.
  • You need permissions to create and manage resources in your Google Cloud Platform (GCP) account. Please contact your administrator where applicable.
  • You need access to a Google Cloud project that has one or more datasets containing at least one table.

Create pipeline

  1. In Data Loader, click Add pipeline.
  2. Choose Google BigQuery from the grid of data sources.
  3. Choose Batch Loading.

Connect to Google BigQuery

Configure the Google BigQuery database connection settings, specifying the following:

Property Description
Google BigQuery Connection Select a connection from the drop-down menu, or click Add Connection if one doesn't exist.
Connection Name Give a unique name for the connection, and click Connect. A new browser tab will open, where Google will ask you to confirm authorization using valid credentials.
Project ID The Google Cloud project to use. For more information, read Creating and managing projects.
Dataset ID The Google BigQuery dataset to load data into. For more information, read Introduction to datasets.
Advanced settings Additional JDBC parameters or connection settings. Expand the Advanced settings, and choose a parameter from the drop-down menu. Enter a value for the parameter, and click Add parameter for any extra parameters you want to add. For a list of compatible connection properties, read Allowed connection properties.

If you receive the error Response too large to return, consider specifying a destination table in your job configuration. Read Error messages for more information.

You may need to add two connection options:

  1. AllowLargeResultSets:true
  2. `DestinationTable: ..${table}

These connection settings will facilitate the use of AllowLargeResultSets when staging the data. Please retain the appendage ${table}, as this is resolved at runtime.

Click Continue.


Choose tables

Choose a BigQuery schema from the dropdown list.

Choose any tables you wish to include in the pipeline. Use the arrow buttons to move tables to the Tables to extract and load listbox and then reorder any tables with click-and-drag. Additionally, select multiple tables using the SHIFT key.

Click Continue with X tables to move forward.


Review your data set

Choose the columns from each table to include in the pipeline. By default, Data Loader selects all columns from a table.

Click Configure on a table to open Select columns. Use the arrow buttons to move columns out of the Columns to extract and load listbox. Order columns with click-and-drag. Select multiple columns using SHIFT.

Click Done to continue.

Click Continue once you have configured each table.


Choose destination

  1. Choose an existing destination or click Add a new destination.
  2. Select a destination from Snowflake, Amazon Redshift, or Google BigQuery.

Set frequency

Property Description
Pipeline name A descriptive label for your pipeline. This is how the pipeline appears on the pipeline dashboard and how Data Loader refers to the pipeline.
Sync every The frequency at which the pipeline should sync. Day values include 1—7. Hour values include 1—23. Minute values include 5—59. The input is also the length of delay before the first sync.

Currently, you can't specify a start time.

Once you are happy with your pipeline configuration, click Create pipeline to complete the process and add the pipeline to your dashboard.