API Query Profiles
Overview
API Query Profiles are collections of endpoint configurations that define the output from a JSON or XML based API. The data is flattened before staging, then pulled into your target data warehouse.
API Query Profiles can be added and configured in the project menu: Project → Manage API Profiles → Manage Query Profiles. This dialog and its functions are explained in Manage Query Profiles and should be understood before continuing.
API Query Profiles and endpoints can be used in the API Query component by selecting the Profile property. Users will then select the desired endpoint to be used from this profile as the data source.
Unlike API Extract Profiles, API Query profiles must define how the data is flattened and so tend to be more complicated in nature. This definition takes the form of an RSD file that the user must configure. However, Matillion ETL provides a wizard to create these RSD files and we advise using this where possible. This article describes the use of that wizard. A more technical and complete description of RSD files can be found in API Query Profiles - RSD files.
Manage and configure query profiles
The Manage Query Profiles dialog exists to organize your API Query Profiles.
To set up a new profile use the + button, or the - button after selecting a profile to remove it. Existing profiles can be edited by clicking the cogwheel icon next to the profile you want.
Configuring a profile will take you to the Configure Query Profile screen. On the left, endpoints belonging to this profile are listed and selecting one will give a summary of its configuration on the right.
To add a new endpoint to the profile, click the New Endpoint button. To configure or delete an existing profile, click the cogwheel or X icon, respectively, next to the endpoint.
Endpoints are defined as RSD files. Toggling Advanced Mode on will reveal the raw RSD file contents rather than a configuration summary and they can be edited in this form. However, the explicit workings of RSD files are an expansive topic that can be found in API Query Profiles - RSD files.
Creating a new RSD file
Choose a file to upload and attach to this API Query Profile.
Users can choose to upload their own RSD and RSB files to define their endpoints. Alternatively, JSON and XML files can be uploaded to help test the existing endpoints.
Wizard - Source Details
This section of the wizard is simply for labelling your new API Query Profile. The name and description are how Matillion ETL will refer to this entry and are arbitrary.
- Name: A name for your new API Query Profile endpoint.
- Description: A description for your new endpoint.
Wizard - Endpoint Configuration
The Endpoint Configuration section takes details about your API request. Different requests will require a different amount of settings before a successful response can be returned.
Clicking Send will attempt the request as defined. - If the request is successful, the returned response will be recorded in the Response tab. - All responses (successful or otherwise) have their status code and response headers logged in the Log tab.
Endpoint URI
Select the response type for your endpoint (GET or POST) and then enter the endpoint URI into the adjacent field. This is your data source.
Auth
Authorization information is NOT stored in the API Query Profile or RSD files but IS recorded as Connection Options. We advise removing authorization details from the attached connection options after completing the wizard. Select an RSD and click the cogwheel to access attached connection options.
Many endpoints will require you to provide some authorization details to accompany your request.
If authorization is required, begin by clicking the toggle so it is green and Enabled. Then, configure your authorization according to the type required. This type depends entirely on the endpoint and it is up to the user to have authorization details at hand.
- Basic Auth
- User: A valid username to authorize with the endpoint.
- Password: A valid password for the above username.
- Bearer Token
- Token: The bearer token string. This token will need to be acquired independently by the user. Most bearer tokens will expire after some time and need to be refreshed by editing this API Profile.
- API Key Value Pair
- Key: The key used for authorization. This will need to be acquired independently by the user.
- Value: The corresponding value for the above key. This will need to be acquired independently by the user.
- Parameter Type: Specifies whether the Key Value Pair is included in the Header of the request or as Query parameters in the URI of the request.
- OAuth
- Authentication: Select the entry for your stored OAuth. OAuths can be stored as a resource in Matillion ETL and usually require independent configuration by the user with their third-party service.
Params
Note
Parameters are defined here only so they can be recognised and used by the API Query component, which allows users to define values for their parameters on the component itself. Therefore, any values given to these parameters are for testing purposes and will not be automatically used in loading data. The exception to this are "Constant" parameters.
Parameters can be added to your request by clicking the + button, or removed by selecting a parameter and clicking the - button.
Parameters in API requests are usually used to specify a particular response, such as specifying paths or querying data before it is returned. The user should be familiar with their intended data source so as to identify which, if any, parameters are required for their request.
- Parameter Name: The name of the parameter. This is not arbitrary and relates exactly to a parameter expected by the source system.
- Value: The value this parameter will be set to.
- Type:
- Query: Parameters put into the request URI after a ? symbol. These usually define a subset of data to be returned.
- URI: Otherwise known as a "path" parameter. These rely directly on your given endpoint URI and can be used to replace parts of it. Any part of the endpoint URI given in curly brackets {}, can be replaced by a URI parameter.
- For example. Specifying a URI parameter
myInstance
as127.0.0.1
can be used with the endpoint URIhttp://{myInstance}/rest/v1/
so the request URI becomeshttp://127.0.0.1/rest/v1/
- For example. Specifying a URI parameter
- Header: Parameters included in the request header, often for the purpose of authorization.
- Variable Type:
- Connection: This variable can be set through the Other connection property. See the Using Parameters section at the bottom of this article for more information.
- Input: This type of variable allows you the option of using that parameter in the Data Source Filter of the API Query component. See API Profiles - Parameters for more information.
- Constant: Specify whether this parameter is constant or not. Constant parameters retain the value they are set with in this configuration wizard and cannot be changed later in the API Query component properties or connection properties.
Body
The body is defined here for testing purposes and will not be automatically used in loading data. The API Query component properties allows users to specify the body they wish to use.
In this field you can enter a string as the body of the request. Although a body can be included with GET requests, it will not be used by the source system. As such, it is only important to include a body with POST type requests.
Response
Your request can be sent using the Send button in the wizard. This will use your endpoint, auth, params and body configuration as appropriate. If the request is successful, a response should be returned in this tab. The exception to this is in cases of a 204 No Content status code as explained in the Log section.
The Endpoint Configuration section of the wizard is designed to allow users to keep editing and retrying their configuration until they are satisfied with the response. Many configuration details are expressly for the purpose of returning a response that can be later used in the Response Configuration section of the wizard, and not for saving those details for later use. For example, users might enter authentication details or parameter values but these should be reentered when using the API Profile in its appropriate component and are often not stored after the wizard is completed.
Log
Regardless of if your request succeeds or fails, Matillion ETL will display information about the response in this tab.
This data includes the final request URI and headers, including all parameters.
Wizard - Response configuration
If your request was configured correctly, the response data structure will now have been parsed by Matillion ETL and a suggested structure is returned. The user can expand and collapse the structure tree to explore the response.
Select fields
Users can specify a single Repeating Element in the response tree by right-clicking an element and setting it. This will prompt Matillion ETL to parse the JSON as though every occurence of that element is a single row in the target table. Elements outside of the selected repeating element are discarded.
Every field has a checkbox beside it. Leaving a field checked will include it to be parsed into table data. Unchecked fields are excluded.
Paging strategies
Choose and configure a pagination strategy, if one is required. For more information, see API Profile Pagination.
- Full Path: Used when the API response in each page includes the instructions for fetching the "next" page.
- Next page URI: The field from the tree used to indicate the next page's URI. This can be selected by right-clicking a field on the tree.
- Page Size Parameter: The parameter used to indicate the page size. (Optional)
- Page Size: Number of records to return per page. (Optional)
- Relative Path: Used when the API response includes a relative path to the next page which can be suffixed to the base URI.
- Base URI: The URI constant to exclude from the relative path.
- Next page URI: Select the field from the tree (right click a field) used to indicate the next page URI.
- Page Size Parameter: The parameter used to indicate the page size. (Optional)
- Page Size: The number of records to return per page. (Optional)
- Offset-based: The simplest form of paging where the API returns a set number of records at a time.
- Limit: The number of records to return per page.
- Record Count: Select a field from the tree (right click a field) used to indicate the count of total records.
- Link Header: Where the API response includes a link header that gives the URL for the next (as well as previous, first and last) page.
- Page Size Parameter: The parameter used to indicate the page size.
- Page Size: The number of records to return per page.
- Cursor-based: Used when the response contains an element that points to the next/previous page.
- Cursor: Select the next cursor (right click a field) from the tree.
- Cursor Parameter: The name of the cursor query parameter.
- Page Size Parameter: The name of the page size query parameter. (Optional)
- Page Size: The number of records to return per page. (Optional)
- Page-based: Used when the endpoint accepts a page query param which is an integer indicating the page within the list to be returned.
- Page Size Parameter: The parameter used to indicate the page size. (Optional)
- Page Size: The number of records to return per page. (Optional)
- Page Number Parameter: The name of the page number parameter.
- Last Page: Select how the last page will be handled and the field from the tree containing the last page by clicking right on the tree view field. (Optional)
Wizard - Review
Data preview
A preview of the table data can be previewed here. This is created by flattening the JSON or XML data in the response.
Config review
A summary of the decisions made can be found here. This same summary can be seen when selecting an endpoint on the Configure Query Profile screen.
Clicking Back will return you to previous configurations screens where adjustments can be made.
Logs
Additional information on your API Query Profile creation can be found here, including troubleshooting messages if the process has not gone as expected.
The range of possible response codes (and exactly how they behave) depends on the API being connected to but below are the most common types to find in API Profiles:
CODE | STATUS | DESCRIPTION |
---|---|---|
200 | OK | A successful request that has returned a response. |
204 | No Content | A successful request that did not return a response body by design. Further details in the log may reveal information about the response, despite the lack of a response body. |
400 | Bad Request | The request was invalid (incomplete or malformed) and could not be handled by the server. |
401 | Unauthorized | Credentials are required for this endpoint and were either incorrect or missing from the request. See Authorization above. |
403 | Forbidden | A generic response for the server refusing a request, even though the request may be well formed. |
404 | Not Found | The requested resource could not be found by the server. This usually means the endpoint is incorrect. |
429 | Too Many Requests | The rate limit has been reached and the server is refusing new requests. This condition is usually lifted after a short time but is likely to be quickly reached again unless changes to request or job can manage the request rate appropriately. |
503 | Service Unavailable | The server has acknowledged the request but cannot currently handle it. This is usually a temporary issue and should be investigated with the server or host service. |
Using Your API Query profile
Once your API Query Profile has all the endpoints you wish to configure, you can leave the Manage Query Profiles screen and return to the canvas.
Including the API Query component on the canvas of an orchestration job will allow you to use your API Query Profile in its Profile property. Note that the user must define all authorization, parameters and body in this component. Values from the configuration wizard are not automatically used, although "Constant" parameters will be.
Generating an RSD File using static files
The wizard can also be pointed at static files (for example: the URI property can be pointed at an XML file sitting on a public S3, instead of expecting one to be returned from an API call). This method is recommended as it allows users to construct API calls utilising a preferred API tool. This can be particularly useful if API calls have an extensive list of parameters to enter, or a POST method is used instead of GET.
Once an RSD file is generated using a static file, additional structures can be then created and necessary parts (such as the URI) can be edited.