Manage Query Profiles
Overview
The Matillion ETL API Query lets you query a JSON or XML-based API, and loads the resulting data into a table. It is a powerful and flexible component, which deals with APIs that contain nested data structures, and requires authentication and paging. When working with tables, rows, and columns, the main step in handling the hierarchical data from APIs is to rationalize it. This is done with a Matillion "Query Profile" specification, which will need to be created.
Using the API Query component lets you load data from any accessible JSON or XML API. This may be your own API or an API which we do not yet have a component for. Through the Manage Query profile interface you will be able to manage your API profiles that help you to extract the required data from the resource.
Important Information
- Every API Query profile in Matillion ETL contains one or more RSD "files" or Query Profiles. A Matillion Query Profile allows you to convert the hierarchical XML or JSON data from an API endpoint into a table that can be used by target platforms. Authentication for an API Query Profile is stored in the Query profiles.
- All API profiles within Matillion ETL can be run using an API Query component in an Orchestration job.
- For a detailed description on authorization and authentication methods, please refer to Authentication with API profiles
- For a detailed description on parameters and passing them into API endpoint, please refer to Using parameters with API profiles.
Access Matillion Query Profiles by clicking Project, Manage API Profiles, then Manage Query profiles.
This document explains how to do the following:
- Load and save JSON response of the API
- Create an API Profile file (JSON File)
- Generate Query Profile (RSD File)
- RSD file anatomy
- Use Query Profile in the API Query component
- Pagination & strategies
Load and Save JSON response of the API
We will be using an Open API Endpoint AWS IP Range API as an example to generate an RSD file in this guide. This API is simple, universally available, and requires no authentication. The API returns a list of the IP address ranges which AWS use for services, and is returned as a JSON document in a nested array. This can be accessed from a web browser by navigating to the URL: https://ip-ranges.amazonaws.com/ip-ranges.json.
Before we start creating a new profile, ensure you download and save the response in JSON format, e.g. "ip-ranges.json".
- Use Postman (or cURL) to get a response back from the API endpoint you're trying to communicate with, and save that data as a JSON or XML file.
- Use the file to help build the first RSD of your API Query profile.
Verify that the file contains an array or repeating element at the granularity you want. Take note of the logical path to this repeating group.
Create an API profile file (JSON File)
Once you have your data saved into a local file you can create an API profile file.
First, create and upload the JSON or XML file to the interface. Generate the Query profile RSD file using the uploaded JSON or XML file. Use the generated Query profile in the API Query component to fetch the data and store it in a data warehouse. This process also allows users to upload or write RSD files.
1. Click Project , Manage API Profiles, then Manage Query Profiles. It will contain one API Profile for every API that has been configured. Click to add a new Query profile.
2. In the Add Query Profile pop-up dialog, enter "AWS_API" into the Profile Name field, and click OK.
3. In the Manage Query profiles pop-up dialog, find "AWS_API" in the API profiles list, and click . This will open the Configure Query Profile pop-up dialog.
4. In the Configure Query Profile dialog, click on New File to create a new API profile file.
5. The Create API Profile File dialog will be displayed. In the Select File Type section of the dialog, choose the type of file you want to create. Select Create Sample File to upload a JSON or XML sample file. Then, click Next.
Please Note
- You're allowed to add files in various formats such as JSON, XML, RSD or RSB.
- Select "Create Config File" if you want to upload/write an RSD or RSB file.
- Select "Create Sample File" if you want to upload a JSON or XML sample file.
6. In the Create File section of the dialog, upload the JSON file you saved earlier by clicking Browse. Then, click Next.
7. In the final step of the wizard your chosen file will be uploaded. Click Validate to validate the JSON file. If it validates with no errors, click Finish.
8. After successful validation, the "ip-range.json" file will be created in the Configure Query Profile dialog.
Generate Query Profile using JSON file
Once an API profile JSON file is created, generate a new Query profile using the created API Profile file, following the process below:
1. In the Configure Query profile dialog, click New Endpoint to generate a new "rsd" script.
2. The New Endpoint will take you to the Configure Query Connector wizard. Provide the Source details name and description. Use the JSON file you created previously. Tick the Use Sample File option, and select the file you want to use to create the RSD file.
Please Note
This dialog will take through a series of steps to produce an "rsd" script. Options you enter are used to write the basics of an rsd script. This script will be used by the API Query component to get data from the specified API and store it in your data warehouse.
3. An API will typically return a JSON or XML document in the Response tab. This return type is the one selected in the "Return Type" drop-down in the previous step. Click Next.
Please Note
- If you want to use an API endpoint URI to generate the RSD file, enter the URI of the API call you want to make. In this case it could be an AWS IP Range API URI.
- At this point, you can configure authentication (if needed) and pass any parameters to the body of the API endpoint using the "Auth" and "Param" tabs.
- You can add an Input type parameter in the Params tab, and provide a temporary value. This value will be used by the Manage Query Profiles wizard to send a 'test' request. This temporary value is replaced by the value provided in the API Query Component at runtime. In order for the Input parameter to work correctly, there are a number of required steps you must implement within the component. For more information, read API Query.
4. In the Response Configuration dialog, you may only want specific items such as those nested in an object of the JSON. The repeated element allows you to specify that object so you can define your xpath from there, rather than at the start of the document each time.
If you want your xpath to start from the beginning of your object, right-click on Set Repeating Element on the tree view fields (/prefixes), and it will appear in the "Repeating Element" field, near the top, just underneath the "Select Fields" sub-heading.
Enable the "Paging" feature and select the required Strategy from the drop-down menu. Select the field from the tree view by right-clicking on the field to populate the xpath, which points to where the JSON or XML document is and is returned by the API. For more information about the different paging strategies you can choose from, read Pagination & strategies.
Note
Make sure to enable "Paging" and select "Full Path". By enabling paging, each returned data set includes a single, top-level element "NextPageURI" which points to the URL the client should use to fetch the next page.
5.You should see a sample of the data in the Data Preview window. Click this tab to view the details you entered earlier. These details will be generated into a script, which can be edited manually. Click Finish.
6. You will return to the Configure Query Profile dialog. Toggle to Advanced Mode to open the "AWS_IP_ranges.rsd" file in edit mode.
7. Click Test to check the RSD file is working correctly. This will invoke the API and add ip-ranges to the Tables panel. Clicking on the table name will then display its contents at the bottom of the Configure API Profile dialog.
Important Information
- The generated Query profile lets you edit the file manually, add or delete any column, or change the value of any field.
- You may want to adjust the fields returned, by editing the
"attr"
elements near the top of the RSD file (lines 6 to 12 in the example above).
RSD file anatomy
<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema"> <api:info title="AWS-IP-range" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1"> <attr name="ip_prefix" xs:type="string" readonly="false" other:xPath="ip_prefix" /> <attr name="network_border_group" xs:type="string" readonly="false" other:xPath="network_border_group" /> <attr name="region" xs:type="string" readonly="false" other:xPath="region" /> <attr name="service" xs:type="string" readonly="false" other:xPath="service" /> <input name="rows@next" desc="Identifier for the next page of results. Do not set this value manually." /> </api:info> <api:set attr="BackwardsCompatibilityMode" value="true" /> <api:set attr="uri" value="ip-ranges.json" /> <api:set attr="EnablePaging" value="TRUE" /> <api:set attr="pageurlpath" value="/next_page" /> <api:set attr="RepeatElement" value="/prefixes" /> <api:script method="GET"> <api:set attr="method" value="GET"/> <api:call op="jsonproviderGet"> <api:push/> </api:call> </api:script> </api:script>
Each RSD "file" in an API profile describes how to handle one of the endpoints in the API. The main features are:
Column definitions to specify column behavior and use XPaths<api:info title="AWS-IP-range" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
Column Name, type and size (editable)
<attr name="ip_prefix" xs:type="string" readonly="false" other:xPath="ip_prefix" /> <attr name="network_border_group" xs:type="string" readonly="false" other:xPath="network_border_group" /> <attr name="region" xs:type="string" readonly="false" other:xPath="region" /> <attr name="service" xs:type="string" readonly="false" other:xPath="service" />
Column XPaths are relative to a RepeatElement that splits the JSON into rows
<api:set attr="RepeatElement" value="/prefixes" />
URI details
<api:set attr="uri" value="ip-ranges.json" />
Paging is enabled and defined
<api:set attr="pageurlpath" value="/next_page" /> <api:set attr="RepeatElement" value="/prefixes" />
Describe the REST operation (an HTTP "GET" in this example)
<api:script method="GET"> <api:set attr="method" value="GET"/>
Inform Matillion what kind of API it's dealing with
<api:call op="jsonproviderGet">
In this example, a "jsonproviderGet" is used, but there is also an equivalent "xmlproviderGet".
Please Note
Once a working example is available, it's usually best to create a second RSD file in the same profile to experiment with. Choose a different name for this, copy and paste the content from the working file, and use Test to verify that any changes made are working.
To test the above example, we have removed the column "network_border_group" and "service". Click Test to check if the RSD API profile still gives the expected result.
Use Query Profile in the API Query component
Once an API profile has been tested and confirmed to be working correctly and returning data, it's ready to use in the API Query component. This can be done as follows:
1. Create a new Orchestration job, drag the API Query component onto the job canvas, and click on the component icon to open the Properties panel.
2. In the Basic/Advanced Mode, we recommend keeping the Mode of the properties as "Basic". In "Basic" you need to choose a data source and column. Click OK.
Please Note
If you choose the "Advanced" mode you need to make specific SQL queries in the editor.
3. You will return to the component's property list in your Matillion ETL instance. Click ... next to Authentication Method. Use the drop-down menu to select the method, and click OK.
Please Note
For a detailed description on different types of authentication methods Matillion support, please refer to Authentication with API profiles.
4. Click ... and select the profile using the drop-down menu. Click OK.
Please Note
You could also manage the Query Profile or create a new profile using Manage. Please note, the profile is the name of the API Query profile created before. Example: AWS_API
.
5. Click ... next to the Connection Options property. You can specify the param attributes value and dynamically add or change any param values. Click to add any parameter, then select Other using the drop-down menu in the Connection Options dialog. Assign the values to the parameters, situated in the Value column, then click OK.
Please Note
When you configure the Connection Options property in the API Query component, its value will not be persisted.
Please Note
For a detailed description on adding parameters to the API endpoint in the API Query Component, please refer to Using parameters with API profiles.
6. Click ... next to the Data Source property, and select the valid Data Source from the drop-down menu. The Data Source is the "Endpoint Name" when generating a Query profile. For example: AWS_IP_Ranges. Click OK.
7. Click ... next to the Data Selection property. In the Data Selection dialog, select the Items or Variables you want to use from the endpoint. Click OK.
8. Configure the rest of the API Query component by providing a target table name and select a valid Staging Area. When all of the properties have been configured, the icon of the API Query component will change to the color green.
9. Finally, you can run the component by right clicking on the API Query Component and selecting Run Component. If all properties are configured correctly, the job will run. You can view the details in the Tasks Info.
10. Next, click Sample, and then click Data. It should fetch the data as expected.
Pagination & strategies
Most endpoints that return a list of entities will need to have pagination. API paging can be managed in the Response Configuration dialog of the Generating Query Profile process. For more information about configuring pagination and strategies, read Manage API Profiles Wizards.
Pass XML into an API Query Profile (generated RSD)
In order to pass the XML into a generated RSD file within the Matillion ETL Query profile wizard, you can do the following.
- Change the content type to Application/x-www-form-urlencoded.
- Replace api:set with api:setc. The setc syntax allows you to add static text without escaping special characters.
- Wrap the xml body in <![CDATA[ 'XML Here']]>. Example shown below:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <root> <data> <![CDATA[ <?xml version="1.0" encoding="UTF-8" standalone="yes"?> <otherxml>data</otherxml> ]]> </data>