Pipeline UI
The pipeline is a collection of configuration details including the source configuration, destination configuration, and any advanced properties that are needed to form the connection. The Data Loader UI provides an easy process to set up a data pipeline.
Create pipeline
When you log in to Data Loader via Hub for the first time, a welcome page is displayed, inviting you to create your first pipeline by clicking Add pipeline on the dashboard page.
Before you build your first pipeline, please select your region using region selector available on the bottom right of the page.
Choose source
The source database is the one containing the data you want the pipeline to extract. Data Loader supports many diverse sources, displayed as tiles on the screen. Click the source you want to connect to.
To filter the list of sources to show only those available to the processing you intend to use, click CDC or Batch Data under the Load data type heading at the left of the screen.
Each pipeline can only support one source database at a time. Not all sources are compatible with all destinations. To filter the list to show only the sources you may use with a specific destination, click the destination name under the Supported destinations heading at the left of the screen.
Choose data loading process
Sources may allow Batch Load Replication, Change Data Capture (CDC), or both, as options to ingest your data. If your chosen source allows both, you will be presented with a screen to select which method you want to use.
You will be taken to the Choose an agent to manage your pipeline page, where you will have to choose or create an agent.
Choose or create Streaming agent
The Choose an agent to manage your pipeline page shows all agents that you have configured. If you have an existing agent that has a Connected status but hasn't yet had a pipeline assigned, you can click Add pipeline next to that agent. Otherwise, you will have to create a new agent.
To create a Streaming agent, read Agent Setup UI.
When the agent is created and has a Connected status, you will see the Add pipeline button next to it in the list of agents. Click this to add a pipeline to the agent.
To add a pipeline to the agent, you will be taken to the Connect to page.
Connect to source
On the Connect to page, you must provide the information needed to connect to that source. The configuration details will vary for different sources, so read to the appropriate documentation for the chosen sources.
Choose tables
Choose which tables from the data source to use.
Data sources such as spreadsheets don't use tables, and will therefore have different configuration requirements which will be described in the documentation for the source, listed under sources.
Use the arrow buttons to move tables to the Tables to extract and load listbox and then reorder any tables with click-and-drag. Select multiple tables using the SHIFT
key.
Click Continue with X tables to move forward.
You can then choose individual columns from each table to include in the pipeline. By default, Data Loader selects all columns from a table. Click Add and remove columns to change the list of columns. Using the arrow buttons to move columns out of the Columns to extract and load listbox and then reorder columns with click-and-drag. Select multiple columns using the SHIFT
key.
Additionally, you can set a primary key and assign an incremental column state to a column.
Click Done adding and removing to continue and then click Done.
Click Continue once you have configured each table.
Choose destination
Select the cloud storage service your extracted data will be sent to, and enter the connection details required by that destination.
For details of the supported destinations and how to connect to them, see the Destinations category for more detailed information.
You can configure multiple pipelines with the same destination. You can also replicate data from multiple sources into the same destination.
Pipeline settings
Give your pipeline a unique name, so you can identify it later.
CDC Pipeline Summary
Review the selections you have made in each of the previous stages. The summary is divided into the following sections:
- Agent Details
- Source Details
- Selected Tables
- Destination Details
- Pipeline Settings
You can return to any earlier stage to make adjustments if required. If you are satisfied with your selections, click Create Pipeline to complete the process.
Pipeline dashboard
All current pipelines, both Batch and CDC, are displayed on the Pipelines dashboard.
The following table describes the elements in the above illustration:
Page component | Title | Description |
---|---|---|
1 | Title header | Displays the title with list of pipelines created. |
2 | Add pipeline | Click Add pipeline to create a new pipeline. |
3 | Search | Search for pipelines by typing a partial or complete name of the Pipeline. |
4 | Filter | Filter the list of pipelines by Source (in the first dropdown field), Destination (in the second dropdown field) and pipeline Status (in the third dropdown field). The default is to show all results. |
5 | Pipeline list | Displays the list of Pipelines created by the current user. Each row in the list provides a brief summary of an existing Pipeline. The Pipeline Summary includes the Name of pipeline ,the Source from where the data is being fetched (and whether this Pipeline is Batch or CDC), the Destination where data is being loaded, and the current status of the pipeline. |
6 | Sort | Click the arrows in the column headers to sort the list. |
7 | Pipeline detail (CDC pipeline) | Click ... to open a dialog with more pipeline details. This includes the Agent name it's associated with, and the pipeline's Throughput. You can also Stop/Delete pipelines if necessary. If the pipeline is in a streaming state, you can also use Start to restart it. |
8 | Pipeline detail (Batch pipeline) | Click ... to open a dialog with more pipeline details. This includes the Frequency you have setup, Last Sync information about the pipeline, and Rows moved in the pipeline run. You can also Edit/Delete pipelines if necessary. |
When deleting and recreating a CDC pipeline, you must clear out the files that the pipeline places in your cloud storage. If you don't, the new pipeline will recognize the existing offset.dat
file and will therefore skip the snapshot phase.
CDC pipeline status
A CDC pipeline will have one of the following status codes:
- Unavailable: The Pipeline's agent is not currently connected. This may indicate a fault in the agent's installation.
- Not running: The Pipeline's agent is connected but is stopped and not running currently.
- Snapshotting: The pipeline is performing a snapshot to get the initial database state required for CDC to start.
- Streaming: The pipeline is streaming change records to cloud storage.