Data Productivity Cloud architecture
This document provides a detailed explanation of the architecture behind the Data Productivity Cloud, focusing on key components such as Designer, scheduling, agent gateway, Git and the API gateway. Most importantly, we will illustrate how these services interact with Matillion-hosted agents and customer-hosted agents that typically define Full SaaS and Hybrid SaaS deployments, respectively.
Full SaaS architecture
In Full SaaS deployments, Matillion will host the required agent and project metadata. This will call out to data warehouses and source services in the customer cloud.
Hybrid SaaS architecture
In Hybrid SaaS deployments, the agent is hosted in the customer's cloud (Amazon Web Services or Microsoft Azure) and can connect to services such as sources and data warehouses from there.
Detailed Architecture diagram
In the Data Productivity Cloud, users have the option to choose between two deployment models: the Full SaaS model and the Hybrid SaaS model, each offering distinct setups for agent deployment and workflow execution.
In the Full SaaS model, Matillion provides and manages the hosted agent, alleviating users from concerns related to deployment, upgrades, and monitoring. The Matillion-hosted agent directly interfaces with the Hub and Matillion hosted vault, so secrets can be defined directly from within your project.
In the Hybrid SaaS model, users have more control over their infrastructure as they deploy and manage their own agent within their cloud environment. They must also manage their own secret vault and secrets on their chosen cloud platform.
Data Productivity cloud is built on top of Git. Git offers a range of features, including version control, collaboration, branching and merging capabilities, and a distributed system architecture. Additionally, users can integrate their own Git repository with the Designer for version control and collaboration on pipelines.
In the Data Productivity Cloud, Git seamlessly operates without users needing to install or maintain local copies of the application code. Instead, all Git interactions are supported natively in the Designer application. Read our GitHub app overview.
Workflow expanded
Referring to the architecture diagram above, the following notes can be made on the high-level workflow used by the Designer and its task execution in the Data Productivity Cloud:
Authentication and secret management
- Users authenticate themselves through the Hub to access the Data Productivity Cloud.
- Secrets, including API keys and other credentials, are managed securely. This ensures that only authorized users can access sensitive data.
Pipeline design and management
- Users design and configure data pipelines using the Designer.
- The Designer integrates with the Component Information service to provide metadata and handle design-time requests.
- You can also integrate Git with the Designer for version control and collaboration on pipelines. This allows you to track changes, revert to previous versions, and work together on pipelines as a team.
Agent management and monitoring
- The Agent Manager deploys, upgrades, and monitors agents within the Data Productivity Cloud.
- It also queries connection statuses to ensure seamless operation.
Workflow Orchestration and observability
- The Workflow Execution Engine orchestrates pipeline execution.
- The Data Productivity Cloud offers pipeline observability features for monitoring and performance tracking.
Task execution and scheduling
- Task requests are sent to the Agent Gateway for direct communication with customer-hosted agents.
- The scheduler coordinates pipeline executions based on schedules.
API gateway for public exposure
- Grant controlled access to data pipelines via secure, public API endpoints.
- Empower external applications to interact with your pipeline data programmatically.
- Authentication and authorization mechanisms for robust API access control.
Agent communication and secret access
- Matillion-hosted agents communicate with the Hub and Hosted Vault for secure access to customer secrets.
- They also use the Connector Service to retrieve data from various sources such as Salesforce, SAP, and databases.
Customer secret management and agent responsibilities
- Customer secret vaults securely store and retrieve customer secrets.
- Customer-hosted agents are responsible for running any components in data processing pipelines.
To understand the flow of secrets in Data Productivity Cloud, read Secret overview.
Comparison of deployment models
Deployment components | Full SaaS | Hybrid SaaS |
---|---|---|
Agent deployment | Matillion provides and controls hosted agent infrastructure. | Any number of Hybrid agents can be deployed to different cloud providers and regions, providing fully segregated data environments. |
Secret management | Matillion's Data Productivity Cloud hosts your secrets. | Matillion's Data Productivity Cloud references the native secret vault in the agent's deployed location or infrastructure, such as AWS Secrets Manager. |
Data security | Matillion takes care of the security of both the control plane and the data plane (agent). | Installing the Hybrid agent within a customer's cloud environment enhances data sovereignty, data residency, and data security while also enabling the use of private links for increased secure connectivity. |
Central interface | Projects and environment management, pipeline design and scheduling, and user management are all available from a central SaaS application. | Projects and environment management, pipeline design and scheduling, and user management are all available from a central SaaS application. |
Integrations | 130+ connectors instantly available. | 130+ connectors instantly available, plus the ability to upload approved third-party drivers. |
High code | Python Pushdown and Bash Pushdown available for Snowflake. | Classic Python component available, in addition to Python Pushdown and Bash Pushdown for Snowflake. |
API interface | Users can interact programmatically with the Data Productivity Cloud using the API interface. | Users can interact programmatically with the Data Productivity Cloud using the API interface. |
Scalability | Matillion can manually control agent scaling, triggered by observed usage or pre-emptive planning. | Hybrid agents can be scaled in the user-managed agent deployment to meet any required demand. |
Summary | Simplifies setup and provides freedom from infrastructure management. | Gives customers complete control over their data plane infrastructure and data security with unlimited scaling options, in addition to extra capabilities. |