v0 examples
Warning
This article is regarding the v0 API which is currently unsupported. Visit the Matillion ETL API - v1 documentation for a more complete API solution.
Overview
These examples are designed to demonstrate the use of the Matillion ETL Rest API. For standard API documentation see here.
The endpoints used in these examples are:
/rest/v0/projects - For exporting and importing project configuration.
/rest/v0/tasks - For running and monitoring tasks.
Note: If using a Bash or Python component to call the API on that same instance, use the instance's Private IP or local host (127.0.0.1:8080 for HTTP or 127.0.0.1:8443 for HTTPS) for the instance address.
Example 1: Retrieving a list of project names and groups.
It may be useful for a user to gather a list of project names and group names for a particular Matillion ETL instance, especially as this can instruct the user toward more refined uses of the API. The structure of a given Matillion ETL instance can be returned using the following API call:
curl -X GET "http://servername-or-ip/rest/v0/projects?
As an example, suppose a Matillion ETL instance has the following group/project structure.
curl -X GET "http://<Instance IP>/rest/v0/projects?export=false"
A complete list of groups, projects, environments, versions and jobs will be returned as a JSON. However, especially for large projects, this can be unwieldy. Instead, we can return a list of projects using the following:
curl -X GET "http://<Instance IP>/rest/v0/projects?projectName=*"
For the above example, this will return the following JSON:
{ "groups": [ { "id": 2, "projects": [], "projectGroup": "Group A" }, { "id": 156, "projects": [], "projectGroup": "Group B" } ] }
Now we know that the groups are called 'Group A' and 'Group B', the Projects and Jobs belonging to Group A can then be found through:
curl -X GET "http://10.12.1.28/rest/v0/projects?groupName=Group%20A"
Or for a slightly more succinct list:
curl -X GET "http://10.12.1.28/rest/v0/projects?groupName=Group%20A&versionName=*"
A snippet of which is given below:
{ "id": 5, "name": "Alpha", "description": null, "versions": {}, "variables": {}, "environments": { "8": { "id": 8, "name": "Live", "schema": "public", "database": "public_db" } }, }, { "id": 56, "name": "Beta", "description": null, "versions": {}, "variables": {}, "environments": { "59": { "id": 59, "name": "Live", "schema": "public", "database": "public_db" } }
Example 2: Using Matillion ETL with your enterprise SCM
For some customers the best practice topology is to have multiple instances of Matillion ETL for, say Dev, Test and Live and then manage the migration using source control management system combined with a build server or script environment. This allows Matillion ETL to support DevOps scenarios where all configuration is held centrally.
In this example, we will use a project called "Marketplace Data Warehouse" in a group called "Matillion". The job is called "Load Marketplace Data" and exists in the version "BW_21-7-2016".
Part 1 - Export a project and commit to source control
Exporting all the data for a given version is simple. Identify the version to export, and ensure all the jobs are included in the result with export=true. (The URL is quoted to protect the & characters being interpreted by bash.)
curl -o <filename> -X GET -u <user>:<password> -H "Content-Type: application/json" "https://<Instance IP>/rest/v0/projects?groupName=<groupname>&projectName=<Project Name>&versionName=<Version Name>&export=true"
Note: Use the --insecure flag if the server is using self-signed certificate (HTTPS only). Otherwise, curl will raise a "Not Trusted" error. A full example with parameters set is given below
curl -o MatillionMarketplace.json -X GET -u joseph:kd9emIMame -H "Content-Type: application/json" --insecure "https://<Instance IP>/rest/v0/projects?groupName=Matillion&projectName=Marketplace%20Data%20Warehouse&versionName=BW_21-7-2016&export=true"
In this case the parameters are:
Username/Password: joseph/kd9emIMame
Group Name: Matillion
Project Name: Marketplace Data Warehouse
Version Name: BW_21-7-2016
Export Filename: MatillionMarketplace.json
If you want to output formatted Json you can use the jq command. e.g.
curl -X GET -u joseph:kd9emIMame -H "Content-Type: application/json" --insecure "https://<Instance IP>/rest/v0/projects?groupName=Matillion&projectName=Marketplace%20Data%20Warehouse&versionName=BW_21-7-2016&export=true" | jq ‘.' > MatillionMarketplace.json
Part 2 - To commit to Git
These steps can be adapted for just about any source control management system.
mkdir MatillionWorkspace
Change into the directory.
cd MatillionWorkspace
Initialise the git repository.
git init
Create a branch to match the version in Matillion ETL.
git checkout -b BW_21-7-2016
Add the files retrieved from the API.
git add *
Commit the files.
git commit -m "Matillion Marketplace initial commit"
Push changes to the repository.
git remote add origin https://matillion-admin@bitbucket.org/matillion/matillion-etl-scm.git git push origin --all
Part 3 - Checkout a project from source control and push to a new Matillion ETL instance
Here will continue from part 2, checking out the Matillion ETL project version we check in earlier.
Check out the project from SCM:
git clone https://matillion-admin@bitbucket.org/matillion/matillion-etl-scm.git
Change into the project:
cd matillion-etl-scm/
Ensure we are looking at the correct branch (or version):
git branch
To push the checked-out JSON data into a new server:
curl -X POST "http://<Instance IP>/rest/v0/projects" -H "Content-Type: application/json" --data-binary @JsonFile.json
Again, we can use the flag --insecure to avoid an error when using a server with a self-signed certificate. For example (with all parameters filled in, including a username and password):
curl -X POST -u joseph:kd9emIMame --insecure "https://<Instance IP>/rest/v0/projects" -H "Content-Type: application/json" -data-binary @/tmp/MatillionMarketplace.json
Note that in the above we give the path as well as the name of the file to be imported.
The return value shows that the operation was a success. An invalid ID is returned from this call, since this operation generates many new ID's (for the project, environment, version, jobs…) - you can see the current project structure by doing a GET request to the same URL but without the export=true option.
{"success":true,"msg":"Import successful","id":-1}
Example 3: Start a job and monitor its progress
A POST to /rest/v0/tasks will run a job. As with the export, we need to identity the version, but we also need to identify the environment to run in, and the job to run. No POST data is required.
e.g.
curl -X POST "http://<Instance IP>/rest/v0/tasks?groupName=Matillion&projectName=Marketplace%20Data%20Warehouse&versionName=BW_21-7-2016&environmentName=Live&jobName=Load%20Marketplace%20Data"
Which returns:
{"success":true,"msg":"","id":1832}
An ID has been generated for the task, which is now running (or already completed). We can check its progress:
curl -X GET http://<Instance IP>/rest/v0/tasks/1832
Which will return a JSON document to you. Here is part of it:
{ "id": 1832, "type": "API_ORCHESTRATION", … "state": "SUCCESS", "enqueuedTime": 1469353082785, "startTime": 1469353082825, "endTime": 1469353242902, "tasks": [ { "type": "VALIDATE_ORCHESTRATION", "jobName": "Load Marketplace Data", "componentName": "Start 0", "state": "SUCCESS", "rowCount": -1, "startTime": 1469353082828, "endTime": 1469353082830, "message": "" }, { ...
The above JSON file gives complete information about the job we just ran. Some things to note are:
- The task has already completed with SUCCESS
- It could also have been QUEUED or RUNNING.
- The status can also be FAILED if the job encountered a problem and did not complete.
- There are some overall fields, then an array of tasks each with their own details.
- Those timestamps are milliseconds (not seconds!) since the epoch (1970).
- If something did go wrong, check the "message" field for detailed error reporting on the task which failed.
Example 4: Use of Variables
Note: This example uses variables in Matillion ETL. For more information on understanding and setting up variables, see Using Variables.
First we have set up a very simple job that makes use of a variable. The Orchestration job (named "APIOrch") simply creates a table and runs a Transformation job (named "APITrans"). The Transformation job creates rows that count to 50000 using an 'increment' variable.
We must set our variable as an environment variable using 'Edit Environment Variables'. By default, this variable has a value of 1, meaning the Transformation job will create 50000 rows.
Now over to the API. Here we use a simple API tool to call Matillion's API. PUSH is used to run a task that is specified by including the Group Name, Project Name, Version Name, Environment Name and finally the Job Name to run, APIOrch.
curl -X PUSH "http://<Instance IP>/rest/v0/tasks?groupName=Matillion&projectName=APIShowcase&versionName=default&environmentName=Example3&jobName=APIOrch"
Which returns:
{ "success": true, "msg": "", "id": 130 }
So we know the job ran successfully and we can use the returned ID to check the state of the task using another API call:
curl -X GET "http://<Instance IP>/rest/v0/tasks/130"
This returns information about the task with ID 130, a part of which is shown below:
{ "type": "EXECUTE_COMPONENT", "jobID": 56, "jobName": "APITrans", "jobRevision": 14, "jobTimestamp": 1484749374478, "componentID": 68, "componentName": "Rewrite Table", "state": "SUCCESS", "rowCount": 50000, "startTime": 1484750708161, "endTime": 1484750709706, "message": "" }
We can see that the row count is 50000. This is because we're incrementing from 1 to 50000 using the default increment, which was set to 1.
Now we repeat the same job but instead changing the increment to 5 by adding a variable to the end of the API call. Because this variable shares the same name as the one we defined earlier, it will overwrite the default value and be used instead. here we set Increment to 5.
curl -X POST "http://<Instance IP>/rest/v0/tasks?groupName=Matillion&projectName=APIShowcase&versionName=default&environmentName=Example3&jobName=APIOrch&variables=increment=5"
We now use the returned ID as before to check the task information. A part of the returned information is shown below:
{ "type": "EXECUTE_COMPONENT", "jobID": 56, "jobName": "APITrans", "jobRevision": 16, "jobTimestamp": 1484751397642, "componentID": 68, "componentName": "Rewrite Table", "state": "SUCCESS", "rowCount": 10000, "startTime": 1484753540348, "endTime": 1484753542547, "message": "" }
Now the row count is only 10000, since the increment was set to 5 before running the task.
Multiple variables can also be passed by separating each variable with a comma such as in the following example:
curl -X POST "http://<Instance IP>/rest/v0/tasks?groupName=Matillion&projectName=APIShowcase&versionName=default&environmentName=Example3&jobName=APIOrch&variables=increment=5,startnumber=2,endnumber=46000"
Note: The variables used must exist on the Matillion instance (using Manage Environment Variables) or this will return an error.
Example 5: Promoting a Job
Promoting a job from one version to another (for example, from Test to Live) is a common task. To accomplish this, the job (in this example, 'dim_airport_setup') to be promoted must first be exported:
curl -o dim_airport_setup.json -X GET -u api-user:password --insecure "https://<InstanceAddress>/rest/v1/group/name/MY_GROUP/project/name/MY_PROJECT/version/name/Test/job/name/dim_airport_setup/export"
Now the job an be imported to the new Version:
curl -X POST -u api-user:password --insecure "https://<ServerIP>/rest/v1/group/name/MY_GROUP/project/name/MY_PROJECT/version/name/Live/job/import" -H "Content-Type: application/json" --data-binary @dim_airport_setup.json
And finally the old Job can be deleted:
curl -X POST -u api-user:password --insecure "https://<ServerIP>/rest/v1/group/name/MY_GROUP/project/name/MY_PROJECT/version/name/Test/job/name/dim_airport_setup/delete"