Skip to content

Agent (GKE) Kubernetes template

It's possible to use a Kubernetes platform to manage CDC agent containers. The provided Kubernetes pod templates will allow the agent image to be retrieved from GKE(Kubernetes) and deployed to any compatible x86 Linux Kubernetes platform.

See the official documentation on Kubernetes pods for more information.

Using the Matillion CDC Agent in a Kubernetes platform is among the more manual methods of setting up a Data Loader CDC pipeline, with a great burden of knowledge on the user. This should only be attempted by users who are adept with Kubernetes and Google Cloud.

:::info{title='Note'} The template provides a blueprint for installation that you may use verbatim, but you may need to modify it to suit your own needs and rules governing your cloud infrastructure. :::


Prerequisites

Resources

This template is intended for users who are accustomed to setting up their own Kubernetes platforms.

  • A Google Cloud account with administrative permissions to create resources.
  • roles/container.admin
  • roles/iam.serviceAccountAdmin
  • A Google Kubernetes Engine cluster with Workload identity enabled.
  • A Google Cloud Project ID where resources will be created.
  • A Google Cloud Region.
  • A Google Cloud Zone.
  • A Google Cloud Storage bucket where the Matillion CDC Agent will land the data.
  • A Google Service account to be used by the Pod via Workload identity. See the documentation.
  • A Google service account with access to the Secret Manager and Cloud Storage bucket.
  • A Google Secret Manager secret.
    • Platform Key (Secret Manager), Secret name (Platform Key gets generated on the first agent creation. All the subsequent agents will use the same key).
    • Database password (Secret Manager) Secret name.
  • Google custom IAM role with the following permissions:
orgpolicy.policy.get
resourcemanager.projects.get
secretmanager.versions.access
storage.buckets.get
storage.multipartUploads.abort
storage.multipartUploads.create
storage.multipartUploads.list
storage.multipartUploads.listParts
storage.objects.create
storage.objects.delete
storage.objects.get
storage.objects.list
storage.objects.update
  • The customer private cloud network and associated firewall rules should be configured so that the agent can communicate with the Matillion CDC platform.

User Access

  • Access to the Hub account and Data Loader.
  • CDC agent environment variables (generated in Data Loader when creating a new agent) Agent ID, Organization ID, and the Platform Websocket Endpoint URL.
  • Data Loader platform key (generated once per Data Loader account the first time you make an agent).
  • Google Cloud account with the ability to create an instance on a billable account and create/grant IAM roles. You may require an administrator from your organization to either give access or perform this process with you.

Recommendations

  • Create a new namespace named matillion-cdc in Kubernetes for CDC resources.
  • Use a templated installation if at all possible.
  • Consult your cloud/network administrator for advice on customer private cloud, Kubernetes Services, and Ingress.
  • Use Workload identity to allow access to the Secret Manager secrets and Cloud Storage bucket.

Template Parameters

The template needs the following environment variables for Data Loader to recognize the agent.

Environment Variable Description
ID_ORGANIZATION This is provided to you by the Data Loader client when setting up a new agent.
ID_AGENT This is provided to you by the Data Loader client when setting up a new agent.
PLATFORM_KEY_NAME The name of the key storing your platform secret that is generated the first time you attempt to create an agent.
PLATFORM_KEY_PROVIDER The service that supplies your platform key. This must be google-secret-manager for the GKE Template.
PLATFORM_WEBSOCKET_ENDPOINT This value must be set to wss://ws-us.matillion-cdc-prod.matillion.com/ws where <region> is either eu or us depending on the Data Loader region you are building the pipeline in.
SECRET_PROVIDERS The service that holds your database passwords. This must be google-secret-manager:1 for the GKE Template.
GCP_PROJECTID This is your Google Cloud Project ID.

Download Template