Obtaining an API token and passing it to an API Query profile
Some API profiles require a refresh token to be passed as part of the header, such as "Volo API", "Hubspot API", and "Salesforce API". Some of these refresh tokens expire regularly (e.g hourly), and will need to be refreshed as part of the corresponding Matillion ETL job. This page uses an example endpoint to describe how to refresh an API token and how to pass this into an API profile.
Example API
The example API requires an authorization token to be passed through with every API call to authenticate the user. The authorization token expires every hour, so to ensure Matillion ETL is making the API call with a valid token, a separate API call must be made to obtain the token.
This is done using the API token endpoint, which requires a Bearer authorization token. This token is only accepted by the token endpoint. No other endpoint in the API, not even an API key required to query all other endpoints, will accept this token.
Some token API endpoints need authorization to generate the access token through clientId, client secret, and refresh token query parameters in the URL, such as Salesforce API.
Generate the access token
Example endpoint API token URI: https://<InstanceName>/services/oauth2/token
An example response from the example endpoint URI to generate an access token is shown below:
Note
Ensure you provide any required query parameters in the endpoint URL. In the example above, it is the Salesforce API token URI that needs query parameters such as grant_type, client_id, client_secret, and refresh_token.
Passing the access token to the endpoint URI
You can pass the access token at an API endpoint URI through the Manage Query Profiles dialog as part of generating an RSD file. By doing that, the generated RSD file will automatically fetch the access token you have provided.
- Click Project → Manage API Profiles → Manage Query Profiles to open the Manage Query Profiles dialog. Click + to add a new query profile.
- In the Add Query Profile dialog, enter "ExampleAPI" into the Profile Name field and click OK.
- Return to the Manage Query Profiles dialog and then locate and click ⚙️ for "ExampleAPI" in the API profiles list. This opens the Configure Query Profile dialog.
- In the Configure Query Profile dialog, click New Endpoint to create a new query profile using the API endpoint URI.
- In the Source details page of the Configure Query Connector wizard, you'll need the endpoint name of what you want the table to be called. You can also provide a description. Once done, click Next.
-
In the URI field, enter the URI of the API call you want to make. At this point, you can configure parameters, any required POST body, and/or a header and Authentication. This example passes a bearer access token to the endpoint URI. To do this yourself, click the Auth tab, select Bearer Token from the Type drop-down and enter the access token you generated earlier. Click Next.
Note
In the above example, the RSD will attempt to contact the real API endpoint, so it's necessary to also add authentication parameters. For the example REST API, it's the access token in the form of an authentication bearer token. Use the same access token you acquired in Generate the access token.
-
In Response Configuration, you may only want specific items such as those nested in an object of the JSON. The repeating group in this example is an array element named "records". Activate Set Repeating Element using right-click on an element of the JSON. Enable the Paging feature and select the required strategy from the drop-down (in this case, Paging is "Full Path"), then click Next.
- You'll see a sample of the data in the Data Preview tab. You can also review your configuration through the Config Review tab when you click Finish.
- Returning to the Configure Query Profile dialog, the file
Example.rsd
will now be open in editable mode.
Note
The RSD script is fully functional as-is, but still contains a hard-coded access token. It's bad practise to leave it this way, for two reasons:
- It's insecure. Owning the access token is equivalent to knowing the username and password, so it needs to be properly protected.
- Access tokens expire after a few hours, so the existing API profile has a very short shelf life.
With these factors in mind, the RSD needs to be parameterized as part of production.
Manage connection options
When setting up authentication or passing parameters in the Manage Query Profiles wizard, the parameter values will be automatically added in Manage Connection Options. You can change or add any additional parameter value of any parameter using this dialog.
Normally, after access tokens expire, the API call will stop working and you'll need to obtain another access token.
Two steps remain to produce the query profile:
- Remove the hard-coded access token from the Manage Connection Options dialog.
- Obtain the access token at runtime and supply the value to the
CustomHeader
parameter in Manage Connection Option asAuthorization: Bearer <token>
by clicking ⚙️ from the Configure Query Profile dialog.
Test the API profile again with the parameter in place to make sure it is passing. After the API profile has been parameterized, you can package the components into an orchestration job.
The next section will explain how to obtain an access token at runtime using a Python Script component, and use that component with an API Query component to extract the access token at runtime.
Obtaining an access token at runtime using Python
Obtaining an access token at runtime can be achieved in various ways. This example replaces the token access url https://InstanceName/services/oauth2/token
command shown earlier, with a Python script in a Matillion ETL Python Script component. The script is available below.
- Add the Python Script component to the canvas.
- Click into the component properties and click ... on the Script property.
- In the code editor, add the following Jython script:
import requests
import json
# The URL for all Salesforce token services
url = 'https://um2.salesforce.com/services/oauth2/token'
# Construct a Python dict containing all the headers, using the job variables
pl = {
'grant_type': 'refresh_token',
'client_id': client_id,
'client_secret': client_secret,
'refresh_token': refresh_token
}
# POST the request and parse the response
resp = requests.post(url, data=pl)
j = json.loads(resp.text)
# Save the returned value into the private job variable named access_token
context.updateVariable('access_token', j['access_token'])
The orchestration job
There are three parts to this job:
- Job variables defining all of the necessary parameters including access_token, clien_id, client_secret, and refresh_token.
- The Python Script component, which obtains a new Salesforce access token using the client id, client secret, and refresh token.
- The API Query component, which uses the access token to extract the API account data and load that data into a database table.
In the API Query component properties, click ... on Connection Options and add a new record:
- Set Parameter as
CustomHeaders
. - Set Value as
Authorization: Bearer ${access_token}
.
Once done, check the response data in the Sample tab after adding the Python Script component to obtain the access token at runtime.
Warning
Do not hard-code the client_id, client_secret, and refresh_token values anywhere, nor store them as default environment variables. Instead, implement password management to protect them—as describe in Using KMS encrypted passwords in Python.
API Query component
Once you've created an API Query profile (RDS profile) and tested that it returns data, you're ready to use that profile in an API Query component. To do this:
- Add the API Query component to the canvas.
- Ensure the Basic/Advanced Mode property is set to Basic.
- Click ... on the Authentication Method property and select Other. Click OK.
- Click ... on the Profile property and select your new API profile.
- Click ... on the Connection Options property and specify the parameters and values as in Manage connection options, above. Add any additional parameters.
- Click ... on the Data Source property and choose your endpoint.
- Click ... on the Data Selection property and select the items you wish to load from the endpoint.
- Configure the remaining properties similarly to any other Query component.
- To run the component, right-click your API Query component and click Run Component. If all properties are set up correctly, the job will run without issue. You can view details of the job run in the Tasks tab.
- While the API Query component is highlighted, click Sample and then click Data to refresh the data sample.