Skip to content

Pipeline UI

The pipeline is a collection of configuration details including the source configuration, destination configuration, scheduling, and any advanced properties that are needed to form the connection. The Data Loader UI provides an easy process to set up a data pipeline.

This article walks you through the steps required to configure and manage a Batch pipeline.

We recommend that you use the Data Loader dashboard's region selector to choose your region for your UI before you start building your pipeline.


Create pipeline

When you log in to Data Loader via Hub for the first time, a welcome page is displayed, inviting you to create your first pipeline by clicking Add pipeline on the dashboard page.

Before you build your first pipeline, please select your region using region selector available on the bottom right of the page.

Note

  • Right-clicking a region in the Data Loader will take you to that region's specific URL, and any data that is saved (such pipeline definitions and agent definitions) will be stored against that region.
  • The Data Loader dashboard region is set to US by default.

Choose source

The source database is the one containing the data you want the pipeline to extract. Data Loader supports many diverse sources, displayed as tiles on the screen. Click the source you want to connect to.

Note

Each pipeline can only support one source database at a time.

Some sources will support Batch processing but not both. To filter the list of sources to show only those available to the processing you intend to use, click Batch Data under the Load data type heading at the left of the screen.

Not all sources are compatible with all destinations. To filter the list to show only the sources you may use with a specific destination, click the destination name under the Supported destinations heading at the left of the screen.


Choose data loading process

Sources may allow Batch Load Replication, Change Data Capture (CDC), or both, as options to ingest your data. If your chosen source allows both, you will be presented with a screen to select which method you want to use.

If you are creating a Batch pipeline, you will be taken to the Connect to page. If you are creating a CDC pipeline, you will first be taken to the Choose an agent to manage your pipeline page, where you will have to choose or create an agent.


Connect to source

On the Connect to page, you must provide the information needed to connect to that source. The configuration details will vary for different sources, so read to the appropriate documentation for the chosen source, listed in the Sources documentation section.


Choose tables

Choose which tables from the data source to use.

Some data sources, such as spreadsheets, don't use tables, and will therefore have different configuration requirements which will be described in the documentation for the source.

Use the arrow buttons to move tables to the Tables to extract and load listbox and then reorder any tables with click-and-drag. Select multiple tables using the SHIFT key.

Click Continue with X tables to move forward.

You can then choose individual columns from each table to include in the pipeline. By default, Data Loader selects all columns from a table. Click Add and remove columns to change the list of columns. Using the arrow buttons to move columns out of the Columns to extract and load listbox and then reorder columns with click-and-drag. Select multiple columns using the SHIFT key.

Additionally, you can set a primary key and assign an incremental column state to a column.

Click Done adding and removing to continue and then click Done.

Click Continue once you have configured each table.

Warning

Configuring large numbers of tables in a single pipeline can cause the connection to the source to timeout. The exact upper limit on the number of tables will depend on many factors—number of rows in each table, network load on the source, etc.

If you encounter this issue, the solution is to split the batch into two or more separate pipelines, each with a lower number of tables.


Choose destination

Select the cloud data warehouse which your extracted data will be sent to, and enter the connection details required by that destination.

Note

You can configure multiple pipelines with the same destination. You can also replicate data from multiple sources into the same destination.


Pipeline settings

Give your pipeline a unique name, so you can identify it later.


Set pipeline frequency

You can configure schedules on the Set Pipeline Frequency page in the UI, using Quartz cron expressions. Cron expressions offer precise control over when and how pipelines are executed.

To learn more, read Scheduling with Quartz cron expressions.


Pipeline dashboard

All pipelines, both Batch and CDC, are displayed on the Pipelines dashboard.

Pipeline dashboard

The following table describes the elements in the above illustration:

Page component Title Description
1 Title header Displays the title with list of pipelines created.
2 Add pipeline Click Add pipeline to create a new pipeline.
3 Search Search for pipelines by typing a partial or complete name of the Pipeline.
4 Filter Filter the list of pipelines by Source (in the first dropdown field), Destination (in the second dropdown field) and pipeline Status (in the third dropdown field). The default is to show all results.
5 Pipeline list Displays the list of Pipelines created by the current user. Each row in the list provides a brief summary of an existing Pipeline. The Pipeline Summary includes the Name of pipeline ,the Source from where the data is being fetched (and whether this Pipeline is Batch or CDC), the Destination where data is being loaded, and the current status of the pipeline.
6 Sort Click the arrows in the column headers to sort the list.
7 Pipeline detail (CDC pipeline) Click ... to open a dialog with more pipeline details. This includes the Agent name it's associated with, and the pipeline's Throughput. You can also Stop/Delete pipelines if necessary. If the pipeline is in a streaming state, you can also use Start to restart it.
8 Pipeline detail (Batch pipeline) Click ... to open a dialog with more pipeline details. This includes the Frequency you have setup, Last Sync information about the pipeline, and Rows moved in the pipeline run. You can also Edit/Delete pipelines if necessary.

Note

When deleting and recreating a CDC pipeline, you must clear out the files that the pipeline places in your cloud storage. If you don't, the new pipeline will recognize the existing offset.dat file and will therefore skip the snapshot phase.

Batch pipeline status

A batch pipeline will have one of the following status codes:

  • Active: The pipeline is active, and scheduled.
  • Paused: This pipeline is not scheduled to run.
  • Running: This pipeline is actively running a schedule right now.
  • Setting Up: This pipeline is in a set-up phase and will move into Running state.