API Profiles Example - Jira Cloud

Overview

This page contains a worked example of the RSD format information found in API Profiles - RSDs. This example uses the Jira Cloud REST API.

Your Matillion ETL instance must be able to access the API endpoint https://yourcompany.atlassian.net over the network. You can test the connectivity using the Network Connectivity Test shared job, which you can download from Data Stagers - Support. If Matillion ETL does not have network connectivity to the API endpoint, you must get this resolved with your network administrator before continuing.

Step 1 - Find the API documentation

Atlassian publish this online guide to their Jira REST APIs.

The Atlassian documentation answers the following questions:

"How do you authenticate?" — Using a username and token, described in this article.
"How do you supply parameters?" — GET and POST methods are both supported. In this example we'll use GET parameters in the URL.
"Does the API provide data at the necessary granularity?" — The Jira Cloud API is large, and this example demonstrates querying issues, and the return format is JSON with embedded arrays, as documented here.
"Does the API use paging, and if so, how does it provide the information needed to page through the result set?" — Yes, this is an example of an 'offset' pager and is documented here. You can use the startAt, maxResults, and total parameters to extract the full set of data.

Step 2 - Load the first page

For convenience it's often best to use the Matillion ETL server itself to extract and load the first page of data.

To get started, SSH into your Matillion ETL server. It will save some typing later if you set up environment variables containing your company name and security credentials.

export mycompany=YourCompanyNameHere
export mycredentials=YourName@YourCompany:YourJiraToken

You can verify network connectivity to the API endpoint using the nc command:

nc -w 5 -v "${mycompany}.atlassian.net" 443

You should see a success message.

Connectivity success

If you see errors such as nc: getaddrinfo: Name or service not known or (tcp) timed out: Operation now in progress, you should contact your network administrator to make sure Matillion ETL has been granted network access to Jira Cloud.

If connectivity was successful, use the following command to extract the first page:

curl -k -o page1-raw.json https://${mycompany}.atlassian.net/rest/api/2/search?maxResults=100 --user "${mycredentials}"

This command will then let you check the output file:

ls -l page1-raw.json

It will be helpful to prettify the JSON so it's more easily human-readable, using this command:

jq "." < page1-raw.json > page1.json

If you run an ls command, you should now be able to see two files: the raw (page1-raw.json) and the prettified (page1.json) JSON.

If you need to copy the prettified JSON file onto your local workstation, you can do so with an scp command such as this:

scp -i YourKey.pem YourUser@YourIP:YourFolder/page1.json .

Open the page1.json file in your preferred editor to verify that the data you have extracted looks like the example from the Jira Cloud API documentation.

page1.json

Now you are ready to upload this as a JSON file into a new API Profile. Follow the steps in Manage Query Profile to do this. Generate a new RSD script as described in that article, using page1.json as the sample file, with a table named t1 and /issues as the repeating element.

Once the new RSD file has been created, remove the rsb:script sections associated with INSERTS, UPDATES, and DELETES, then click the Test button. A table called t1 should appear at bottom left. Click t1 and the data from that table should appear at bottom right.

You should now take the opportunity to refine which fields you want to see. Jira issues have a lot of fields, so you can remove any that you don't intend to use. Do this by editing the elements within the rsb:info block near the beginning of the file.

Switch the RSD to the real URI

Once you reach the step where the first page of data is appearing correctly, you need to use the actual API instead of the sample file, by editing your API Profile as described in Manage Query Profile.

In the RSD definition, replace the "uri" value with the actual API endpoint:

Changing the endpoint uri

Now that the API query will use the real endpoint, you will need to add authentication. The Atlassian API documentation indicates that Jira Cloud's token method uses HTTP basic authentication, so you can add the two necessary HTTP parameters: User and Password.

After saving the changes, test the API again and verify that it still retrieves the correct data, at the required granularity.

Step 3 - Establish Paging

The first two steps are common to all paging implementations.

Add a new input named Rows@Next to the list of columns inside the existing rsb:info block, after all the other declarations.
```
<input name="Rows@Next" desc="Identifier for the next page of results" />
```
Add the following to the rsb:script section just after the rsb:info block:
```
<rsb:set attr="EnablePaging" value="TRUE" />
```

According to the Jira Cloud API documentation, pagination is performed by the client using the startAt parameter for all pages after the first.

The documentation also warns that the consumer needs to be careful with the row counts:

Note

The actual number of items returned is an implementation detail and this can be changed over time. You can ask for more than you are given.

In other words, the actual number of rows returned may differ from the number requested, even between pages. So you need to implement a client-side record counter.

To declare the counter, add this line after the EnablePaging added earlier:

<rsb:set item="userns" attr="rowcounter" value="0" />

To make it count the records as they are returned, add the following two lines together between the jsonproviderGet and the rsb:push:

<rsb:set item="userns" attr="rowcounter" value="[userns.rowcounter|add(1)]"/>

<rsb:set attr="Rows@Next" value="[Rows@Next|add([userns.rowcounter])]" />

The URL will need to change from page to page, so instead of hard-coding it as you did previously, you must replace the fixed rsb:set attr="uri" ... with a conditional block like this:

<rsb:check attr="Rows@Next">
<rsb:set attr="uri" value="https://matillion.atlassian.net/rest/api/2/search?startAt=[_input.Rows@Next]&amp;maxResults=100" />
<rsb:else>
<rsb:set attr="uri" value="https://matillion.atlassian.net/rest/api/2/search?maxResults=100" />
<rsb:set attr="Rows@Next" value="1" />
</rsb:else>
</rsb:check>

Note that special characters in the URI must be replaced by an appropriate XML character reference. These include:

Character	Replacement
&	&
<	<
<	>
Space

Test the API inside the Manage API Profiles editor again, to make sure it's still working. It's not possible to verify the paging in this way because it only shows the first 50 records. But assuming you still see some data, you can move on to using an API Query component in an Orchestration job.

Switching to an API Query component

Add an API Query component to a new, empty orchestration job, and fill in your profile name, data source, and so on.

In the connection options you will need to add your authentication details again, by adding settings for User, Password and AuthScheme. It's fine to hard-code these for now: they will be parameterized later as part of productionization.

You may find a default limit of 100 on the component. If so, remove it, leaving that setting blank.

When you run the component, it should automatically page through all the Jira Issues, and create a database table containing the columns you chose in the API profile. Depending on your data volumes, this step may take several seconds to complete.

Step 4 - Productionize

By this step of the process you should have a fully working API profile, but using hard-coded authentication parameters.

It's bad practice to leave hard-coded authentication parameters in place, for two reasons:

It's insecure. Anyone logged into Matillion ETL can read the credentials in plain text, so they need to be properly protected.
Authentication parameters usually expire at some point, so the existing API profile has a limited shelf life.

To proceed, follow the directions in Using Parameters With API Profiles, which describes how to use parameters with API profiles.

Matillion ETL's password manager function is not available for use with API profiles, so you may choose to follow Using KMS encrypted passwords in Python, which describes how to implement your own password management functionality to protect your private API password.