Skip to content

Quick guide for deploying a Streaming agent in GCP

Use this guide to add a Streaming agent in Data Loader, using the Agent (GCE) template, and then deploy that agent in Google Cloud Platform (GCP). Creating and deploying an agent are required steps to set up a CDC pipeline in Data Loader.

Note

For best performance, your GCP region should be geographically similar to your Hub account region.


Prerequisites

  1. Install Terraform.
  2. Export cloud credentials with access to:
    • Google Project and Service Account. Read Project and Service Account for more information.
    • Google Service account with administrative permissions:
      • roles/container.admin
      • roles/iam.serviceAccountAdmin
    • Google Cloud Storage bucket. Read GCP Storage Bucket for more information.
    • Google Secret Manager. Read Google Secret Manager for more information.
      • Platform Key (Secret Manager) Secret value (Platform Key gets generated on the first agent creation. All the subsequent agents will use the same key).
      • Database password (Secret Manager) Secret name.

Create a Streaming agent in Data Loader

  1. Register and log in to the Hub.
  2. The My Accounts page will be displayed, where you will see a list of the accounts you have already created or joined. At the bottom of your list of accounts, click Add new account. For more information, read Create an Account.

Note

Each Hub account can generate one unique platform key that your Streaming agent will use to communicate with Data Loader. With this in mind, create the Streaming agent in the account that matches the platform key you want to use. For more information, read Platform Keys.

  1. Click Load data on the What do you want to do today? page.
  2. On the Data Loader dashboard, scroll to the lower-right of the UI and choose your region.
  3. In the Pipelines dashboard, select Agents in the left sidebar, and click Add agent.
  4. Give your agent a sensible Agent name and Description. Click Continue.
  5. Since this guide is for GCP, select GCP as your cloud provider.
  6. Choose Terraform as the service to provision and deploy your cloud resources from for the Streaming agent installation.
  7. The Agent setup page will be displayed with useful information and prerequisites you need to know before you can deploy your agent. Please also note on this page the following environment variable values:

    • ID_ORGANIZATION: This value is used when deploying the Streaming agent in GCP. The value is unique per agent.
    • ID_AGENT: Also used when deploying the Streaming agent. The value is unique per agent.
    • PLATFORM_WEBSOCKET_ENDPOINT: Also used when deploying the agent. The value is unique for the Data Loader region (US or EU).
  8. Manage key pair: This is a generated value. If you haven't generated a platform secret for your account yet, Data Loader will prompt you to do so when creating a CDC pipeline. You need to store this value in GCP Secret Manager where your Streaming agent can access it. For security reasons, this public/private key pair can only be generated and shown once per account, so make sure to copy and save it for future use.

  9. If a key pair has been configured, click Return to agent list where you can begin the process of adding an agent.

  10. The Agent containers page will list all current agents in your Hub account, including the one you have just created. Click the Agent setup icon next to the intended agent, and refer to Deploy your Streaming agent in GCP.

GCP Secret Manager

  1. Navigate to the Secret Manager in the Google Cloud console.
  2. On the Secret Manager page, at the top, click Create secret.
  3. On the Create secret page, enter the name of your secret.
    • For database passwords, the secret name can be arbitrary and is referred to in Data Loader.
  4. In the Secret Value section, either upload the value or enter the secret value in a JSON format.
    • The secret value is the Platform Key that's generated on the first agent creation. All the subsequent agents will use the same key.
  5. In Region choose specific regions for storing your secret, select manually if you want to choose any specific region or leave blank.
  6. Click the Create secret button.

Once your secret is created, you can view it by clicking View secret value.

View Secrets


IAM Policies and permissions

Certain permissions are necessary to use the Google Cloud console and to grant Cloud Access Management.

  1. On the Google Cloud Console's homepage, enter "IAM and Admin" in the search bar and search. Click Roles in the left navigation panel.
  2. At the top, click CREATE ROLE.
  3. Enter a title, description, and ID.
  4. Select ADD PERMISSIONS and add the following permissions:

    orgpolicy.policy.get
    resourcemanager.projects.get
    secretmanager.versions.access
    storage.buckets.get
    storage.multipartUploads.abort
    storage.multipartUploads.create
    storage.multipartUploads.list
    storage.multipartUploads.listParts
    storage.objects.create
    storage.objects.delete
    storage.objects.get
    storage.objects.list
    storage.objects.update
    
  5. Select CREATE to create the role with these permissions.

  6. Select IAM & AdminRoles.
  7. Select ADD.
  8. Search for or paste the service account email in New principals.
  9. Select the newly created role in the drop-down menu and select SAVE.

Role Creation


Deploy the Streaming Agent in GCP

Follow these steps to deploy your Streaming agent using the Agent (GCE) template:

  1. Download the Terraform template.
  2. Update the following template file Matillion-cdc-agent.tfvars with the following details:
    • Project Id: This is the Google Cloud project ID.
    • Region: Google Cloud region where the Compute Engine instance will be deployed.
    • Zone: Google Cloud zone where the Compute Engine instance will be deployed.
    • Network_name: Google Cloud network to attach to the Compute Engine instance.
    • Instance_name: Name of the Compute Engine instance e.g. matillion-cdc-agent.
    • Storage_bucket_name: Name of the Google Cloud Storage bucket where the agent will land the data.
    • Organization_id: This is provided to you by the Data Loader client when setting up a new agent.
    • Agent_id: This is provided to you by the Data Loader client when setting up a new agent.
    • Platform_websocket_endpoint: This value must be set to wss://ws-<region>.matillion-cdc-prod.matillion.com:443/ws where <region> is either eu or us depending on the Data Loader region you are building the pipeline in.
    • Platform_key_secret_name: Name of the Platform Key Secret stored in the Google Secret Manager.
    • Database_password_secret_name: Name of the source Database Password Secret stored in the Google Secret Manager.

In a terminal session, where you have copied the downloaded template files, apply the terraform .tfvars file in your terminal.

Apply File

  1. Begin the Terraform deployment process. An example is provided below but this may change depending on your company.

  2. Initialise Terraform.

    terraform init -var-file=remote-state.tfvars`
    
  3. CDC workspace.

    terraform workspace new cdc
    
    terraform workspace select cdc
    
  4. Create CDC infrastructure.

    terraform plan -var-file=matillion-cdc-agent.tfvars
    
    terraform apply -var-file=matillion-cdc-agent.tfvars
    
  5. Once you select apply terraform with the required information in the .tfvars file, initialization will begin, and create the required resources in the Google Cloud console needed for deploying your Streaming agent.

    Resources Created

  6. Once this process completes, re-visit the Google Cloud Console, and on the homepage enter "VM instances" in the search bar. Refresh the page, and your created instance will be displayed.

    Instance Creation

In Data Loader, your created Streaming agent's status will display as Connected, and offer the Add Pipeline button. To add a pipeline to your agent, refer to CDC pipeline overview.