Skip to content

Excel

This page describes how to configure a Microsoft Excel data source. With Data Loader, you can replicate and load your source data into your target destination.

Schema Drift Support: Yes. Read Schema Drift to learn more.

Return to any page of this wizard by clicking Previous.

Click X in the upper-right of the UI and then click Yes, discard to close the pipeline creation wizard.


Prerequisites

  • Read the Allowed IP addresses topic before you begin. You may not be able to connect to certain data sources without first allowing the Batch IP addresses. In these circumstances, connection tests will always fail and you will not be able to complete the pipeline.
  • You will require permissions to create and manage resources in your database account. Please contact your administrator.
  • You should have a spreadsheet with the file extension .xlsx.
  • You need working credentials to access the cloud storage. That includes Amazon S3, Google Cloud storage, or Azure Blob storage.

Create pipeline

  1. In Data Loader, click Add pipeline.
  2. Choose Excel from the grid of data sources.
  3. Choose Batch Loading.

Connect to Excel

Configure the Excel database connection settings, specifying the following:

  1. Select Amazon S3 storage, Google Cloud Storage, or Azure Blob storage.
  2. Click Add AWS/GCP/Azure credential.

Add AWS credential

AWS credentials enable you to write to an S3 bucket and Redshift. Read Cloud credentials for more information.

Property Description
AWS credential label A unique, descriptive name for your AWS credentials.
Access Key ID An AWS access key. Read Understanding and getting your AWS credentials for more information.
Secret Access Key An AWS secret access key. Read Understanding and getting your AWS credentials for more information.

Click Test and save.


Add Google Cloud credential

Upload your Service Account File to connect to Google BigQuery. Read Cloud credentials for more information.

Property Description
GCP credential label A unique, descriptive name for your GCP credentials.
Access Key ID Drag and drop a file or click to upload a file. JSON files only. Max size is 1 MB.

Click Test and save.


Add Azure credential

Add your credentials to connect to Microsoft Azure. Read Cloud credentials for more information.

Property Description
Tenant ID A tenant ID is a unique way to identify an Azure Active Directory instance within an Azure subscription. You will get the Tenant ID in Manage Tenants in Azure portal.
Client ID When you register your client application, you supply information about the application to Azure AD. Azure AD then provides a client ID (also called an application ID) that you use to associate your application with Azure AD at runtime. You will find this in Properties of the application you have created in the Azure Portal.
Secret key Azure Key Vault is a cloud service for securely storing and accessing secrets. You get the secret key from accessing Secret Key Vault. To access the required information, please contact your Azure administrator.

Click Test and save.


Connect to Excel continued

Property Description
Path The path to locate the data you wish to access. This must be a spreadsheet with the file extension .xlsx.
Advanced settings Additional JDBC parameters or connection settings. Click Advanced settings and then choose a parameter from the dropdown menu and enter a value for the parameter. Click Add parameter for each extra parameter you want to add. For a list of compatible connection properties, read Allowed connection properties.

Click Continue.


Configure spreadsheet

  • You can see all the sheets in your spreadsheet on this page. Define a range for each sheet by entering a value in the Range column.
  • You can indicate which sheets use a header row by toggling Contains header.
  • You can remove a sheet from the pipeline by selecting the - symbol alongside the sheet.
  • You can add or replace a sheet by clicking +Add another sheet.

Click Continue.


Review your data set

Choose the columns from each table to include in the pipeline. By default, Data Loader selects all columns from a table.

Click Configure on a table to open Configure table. This dialog lists columns in a table and the data type of each column. Additionally, you can set a primary key and assign an incremental column state to a column.

  • Primary Key columns should represent a true PRIMARY KEY that uniquely identifies each record in a table. Composite keys work, but you must specify all columns that compose the key. Based on the primary key, this won't permit duplicate records. Jobs may fail or replicate data incorrectly if these rules aren't applied.
  • Make sure an Incremental column is a true change data capture (CDC) column that can identify whether there has been a change for each record in the table. This column should be a TIMESTAMP/DATE/DATETIME type or an INTEGER type representing a date key or UNIX timestamp.

Click Add and remove columns to modify a table before a load. Use the arrow buttons to move columns out of the Columns to extract and load listbox. Order columns with click-and-drag. Select multiple columns using SHIFT.

Click Done adding and removing to continue and then click Done.

Click Continue once you have configured each table.


Choose destination

  1. Choose an existing destination or click Add a new destination.
  2. Select a destination from Snowflake, Amazon Redshift, or Google BigQuery.

Set frequency

Property Description
Pipeline name A descriptive label for your pipeline. This is how the pipeline appears on the pipeline dashboard and how Data Loader refers to the pipeline.
Sync every The frequency at which the pipeline should sync. Day values include 1—7. Hour values include 1—23. Minute values include 5—59. The input is also the length of delay before the first sync.

Currently, you can't specify a start time.

Once you are happy with your pipeline configuration, click Create pipeline to complete the process and add the pipeline to your dashboard.