Create a Streaming agent
The Streaming agent is a component within the Data Productivity Cloud that serves as a bridge between the source database and the target cloud data lake, enabling the execution and scheduling of streaming pipelines. The agent will be hosted in your own infrastructure, using a Hybrid-SaaS solution.
Once the Streaming agent is configured and started, it operates autonomously without requiring much intervention. The agent continuously monitors all changes occurring in the source database, consumes those changes from the low-level logs, and delivers them to the designated target data lake or storage. This ensures a continuous and reliable change data capture process.
This topic explains how to create an agent in your own infrastructure. We currently support agents running in AWS or Azure infrastructure.
Note
Each Matillion Streaming agent can run only one Streaming pipeline. Each Streaming pipeline requires a new agent installation.
Prerequisites
- A Hub account. To register, read Registration. Once you have signed up, log in to the Hub.
- An account in either Azure or AWS to host the agent.
- Access to a cloud secrets service, which is a secure storage system for storing authentication and connection secrets. These secrets are used by the agent to authenticate itself with the source database and establish a secure connection for capturing the data changes.
Note
Your source database will also require configuration to work with Streaming pipelines. This is independent from the agent installation process. More information can be found in the documentation for each supported data source.
Create an agent
- Click the menu button in the top left of any Data Productivity Cloud screen, then click Manage → Agents. The Agents screen lists all agents currently created, showing their Status, Platform (AWS or Azure), and Type (Data Productivity Cloud or Streaming).
- Click Add agent.
- Click Streaming.
-
Complete the following properties:
- Agent name: A unique name for your new agent. Maximum 30 characters. Accepts both uppercase and lowercase A-z, 0-9, whitespace (not the first character), hyphens and underscores.
- Description: Optionally enter a brief description of the agent.
- Cloud provider: The cloud platform that the agent will be deployed to. Currently, AWS and Azure are supported.
- Deployment: The supported deployment method for the given cloud provider. Currently, ACI for Azure and Fargate for AWS are supported.
-
Click Create agent.
This creates an agent definition in the Data Productivity Cloud, and displays the agent's parameters on the Agent details screen. The agent's status is set to Pending, which means it is not yet ready to run pipelines. The next step is to deploy the agent application into your cloud infrastructure, as described below.
Set up the agent in your cloud infrastructure
After creating the agent in the Data Productivity Cloud with the above process, the agent needs to be installed into your cloud infrastructure. There are several different ways of doing this, and you can use whichever method suits you:
-
For AWS agents:
-
For Azure agents:
To complete these processes, you will require certain details of the created agent. To obtain these details, locate the agent in the list of agents, and click ... next to it, then click Agent details. The parameters and values in the sections Agent image URI, Agent environment variables, and Credentials are required when configuring your agent in your cloud infrastructure.
Check agent status
After deploying the agent in your cloud infrastructure, you should return to the Data Productivity Cloud to verify that it's correctly connected and running.
- Click the menu button in the top left of any Data Productivity Cloud screen, then click Manage → Agents.
-
Locate the agent in the list and check the status:
- Pending: The agent has been created but has not yet connected to the Hub.
- Running: The agent is connected and available for running Streaming pipelines, or is connected and already running a Streaming pipeline.
- Stopped: The agent has been stopped.
- Unknown: The agent is in an unknown state. The typically means the agent has lost connection to the Data Productivity Cloud without being stopped, for example due to networking issues.
-
When the agent status shows Running, it's ready to use. It can be selected in the Agent drop-down when you create a new Streaming pipeline, as long as a pipeline is not already assigned.
Deleting agents
To delete an agent from the Data Productivity Cloud, locate the agent on the Agents screen, then click ... → Remove agent.
This action is irreversible, so be sure that you want to continue.
Deleting the agent from the Agents screen doesn't remove the underlying AWS or Azure resources. You should go into the AWS Console or Azure Portal and clean up any resources that you no longer require.
Warning
Deleting an agent that is currently running may interrupt pipelines that are currently running. Therefore, you should always stop the agent service before deleting it.