Agent (GKE) Kubernetes template
It's possible to use a Kubernetes platform to manage CDC agent containers. The provided Kubernetes pod templates will allow the agent image to be retrieved from GKE(Kubernetes) and deployed to any compatible x86 Linux Kubernetes platform.
See the official documentation on Kubernetes pods for more information.
Using the Matillion CDC agent in a Kubernetes platform is among the more manual methods of setting up a Data Loader CDC pipeline, with a great burden of knowledge on the user. This should only be attempted by users who are adept with Kubernetes and Google Cloud.
The template provides a blueprint for installation that you may use verbatim, but you may need to modify it to suit your own needs and rules governing your cloud infrastructure.
Prerequisites
Resources
This template is intended for users who are accustomed to setting up their own Kubernetes platforms.
- A Google Cloud account with administrative permissions to create resources.
roles/container.admin
roles/iam.serviceAccountAdmin
- A Google Kubernetes Engine cluster with Workload identity enabled.
- A Google Cloud Project ID where resources will be created.
- A Google Cloud Region.
- A Google Cloud Zone.
- A Google Cloud Storage bucket where the Matillion CDC agent will land the data.
- A Google Service account to be used by the Pod via Workload identity. See the documentation.
- A Google service account with access to the Secret Manager and Cloud Storage bucket.
- A Google Secret Manager secret.
- Platform Key (Secret Manager), Secret name (Platform Key gets generated on the first agent creation. All the subsequent agents will use the same key).
- Database password (Secret Manager) Secret name.
- Google custom IAM role with the following permissions:
orgpolicy.policy.get
resourcemanager.projects.get
secretmanager.versions.access
storage.buckets.get
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.update
- The customer private cloud network and associated firewall rules should be configured so that the agent can communicate with the Matillion CDC platform.
User Access
- Access to the Hub account and Data Loader.
- CDC agent environment variables (generated in Data Loader when creating a new agent) Agent ID, Organization ID, and the Platform Websocket Endpoint URL.
- Data Loader platform key (generated once per Data Loader account the first time you make an agent).
- Google Cloud account with the ability to create an instance on a billable account and create/grant IAM roles. You may require an administrator from your organization to either give access or perform this process with you.
Recommendations
- Create a new namespace named
matillion-cdc
in Kubernetes for CDC resources. - Use a templated installation if at all possible.
- Consult your cloud/network administrator for advice on customer private cloud, Kubernetes Services, and Ingress.
- Use Workload identity to allow access to the Secret Manager secrets and Cloud Storage bucket.
Template Parameters
The template needs the following environment variables for Data Loader to recognize the agent.
Environment Variable | Description |
---|---|
ID_ORGANIZATION | This is provided to you by the Data Loader client when setting up a new agent. |
ID_AGENT | This is provided to you by the Data Loader client when setting up a new agent. |
PLATFORM_KEY_NAME | The name of the key storing your platform secret that is generated the first time you attempt to create an agent. |
PLATFORM_KEY_PROVIDER | The service that supplies your platform key. This must be google-secret-manager for the GKE Template. |
PLATFORM_WEBSOCKET_ENDPOINT | This value must be set to wss://ws-us.matillion-cdc-prod.matillion.com/ws where <region> is either eu or us depending on the Data Loader region you are building the pipeline in. |
SECRET_PROVIDERS | The service that holds your database passwords. This must be google-secret-manager:1 for the GKE Template. |
GCP_PROJECTID | This is your Google Cloud Project ID. |