API Profiles - RSDs
Overview
This article explains common tags and parameters available for use in RSD files. It is recommended to not manually edit RSD files and instead to use the wizard where possible.
The RSD file describes how an API call is made and the returned data is understood. RSD format is based on that used for CData providers. We highly recommend that users new to RSD files follow the timextender documentation on the subject.
Anatomy of an RSD
For basic API Query Profiles, the following structure should be followed:
- Open with an
<api:script
tag. - Open your column set definition with
<api:info
. - Define your columns with
<attr
. - Define your input columns with
<input
. - Close your column set definition with
</api:info>
. - Set your parameters with
<api:set
. - Define your GET or POST methods with
<api:script method="<method>">
. - Close your open tags and end the script.
This will become clearer with the below sections defining the use of different tags and with comparison to the example at the bottom of this article.
Scripts and scope
The api
tags are used to contain the other definitions listed below. The scope of those definitions depends on which api
tags they belong in.
- api:script is used to enclose column definitions and attributes.
api:script method="<requestMethod>"
: can be used to specify a specific method (such as GET or POST) and enclose defintions specific to that method.api:script
: without a specified method is always used to enclose the entire schema and can be consider the "global" scope.
Set definition
- api:info: Encloses a set of column definitions for your API response.
- title: gives the title of this column definition set.
- desc: gives the description of this column definition set.
Parameters
- api:set: Set configuration parameters. These parameters have scope and will affect only the methods they are defined inside.
Attributes
Possible attribute (<attr
tag) parameters:
- name: The alphanumeric string that defines the name of the column.
- xs:type: The data type of the column. The string, int, double, datetime, and boolean types are supported.
- other: Attributes prefixed with 'other:' that provide extra information. These other properties can be operation specific. For example, other:xPath specifies the XPath to a node in a local or remote resource. The XPath can be relative to a RepeatElement.
- desc[ription]: A short description of the column.
- key: Whether the column is part of the primary key in the table.
- readonly: Whether the column can be updated. Allowed values are true and false.
- req[uired]: Whether the column must be specified in an insert. Allowed values are true and false.
- def[ault]: The default value for the column if none is specified.
- values: A comma-separated list of the valid values for the column. If specified, the engine will throw an error if the specified column value does not match one of the allowed values.
- reference[s]: The foreign key to the primary key of another table. The foreign key is specified with the following syntax: table.key. For example: "Employees.EmployeeId".
- columnsize: The maximum character length of a string or the precision of a numeric column. The precision of a numeric column is the number of digits.
- scale | decimaldigits: The scale of a decimal column. The scale is the number of digits to the right of the decimal point.
- isnullable: Indicates whether the column accepts null values. Note that this does not prevent sending or receiving a null value.
Inputs
Possible Input (<input
tag) parameters:
- name: The name of the input. An alphanumeric string that may additionally contain the following: "#" denotes that the input can have multiple values, "myprefix:*" denotes a set of inputs with the same prefix, and a value of "*" denotes arbitrary input parameters.
- desc[ription]: A short description of the input.
- xs:type: The data type of the input. The string, int, double, datetime, and boolean types are supported.
- def[ault]: The default value to be used when no input value is supplied in the script call.
- key: Whether the input is a primary key.
- req[uired]: Whether the input is required. The engine will throw an error if the required input is not supplied and there is no default value defined. Allowed values are true and false.
- values: A comma-separated list of the allowed values for the input. If specified, the engine will throw an error if the specified input does not match one of the allowed values.
- other: Attributes prefixed with "other:" that provide extra information.
- alias: The alias of the input.
Outputs
Possible Output (<output
tag) parameters:
- name: The name of the output. "myprefix:*" denotes a set of outputs with the same prefix, and a value of "*" denotes arbitrary output parameters.
- xs:type: The data type of the output. The string, int, double, datetime, and boolean types are supported.
- desc[ription]: A short description of the output.
- columnsize: The maximum character length of a string or the precision of a numeric output. The precision is the number of digits.
- other: Attributes prefixed with 'other:' that provide extra information about the output. For example, other:xPath specifies the XPath to a node in a local or remote resource. The XPath can be relative to a RepeatElement.
API settings
Possible API settings (<api:set
tag) parameters:
- URI: The URI string for the request.
- Header: Define headers for the request. See the Headers section for more information.
- EnablePaging: Enables paging to parse API responses. We recommend reading the timextender guide here for more information on RSD paging.
Paging
Pagination must be enabled with the following:
<api:set attr="EnablePaging" value="true"/>
This must be set after the <api:info
section of your RSD but before API calls are made.
API properties are given as below:
- pageoffsetparam: The parameter in the API that sets the page offset. Using this will opt to use offset-based pagination.
-
pagenumberparam: The parameter in the API that sets the page. Using this will opt to use page-based pagination.
-
pagesizeparam: The parameter in the API that sets the page size.
- pagesize: The page size requested.
For example:
<api:set attr="pageoffsetparam" value="offset" />
<api:set attr="pagesizeparam" value="pagesize" />
<api:set attr="pagesize" value="200" />
Headers
Headers are set using a set of properties that give the header name and value, as below:
<api:set attr="Header:Name#" value="HeaderName" />
<api:set attr="Header:Value#" value="HeaderValue" />
For example:
<api:set attr="Header:Name#" value="Content-Type" />
<api:set attr="Header:Value#" value="application/json" />
URL params
URL Parameters can be set from inside the wizard but can also be added manually to RSD files. This means editing the CustomURLParams property on the _connection object using the following syntax:
<rsb:set attr="_connection.CustomUrlParams" value="ParamName=ValueName" />
Multiple values can be set at once, separated by an & symbol:
<rsb:set attr="_connection.CustomUrlParams" value="ParamName1=Value1&ParamName2=Value2" />
Arrays
Items in an array can be referenced in one of two ways:
- Reference individual items by number, for example:
<attr name="tag1" xs:type="string" other:xPath="tags[0]" />
<attr name="tag2" xs:type="string" other:xPath="tags[1]" />
<attr name="tag3" xs:type="string" other:xPath="tags[2]" />
- Return the array as a comma-separated string using the
other:valueformat="aggregate"
attribute (this can then be separated and pivoted using a transformation job):
<attr name="tags" xs:type="string" other:xPath="tags" other:valueformat="aggregate" columnsize="20000" />
Special characters
When defining API Profiles (RSDs), sometimes an API response might contain a special character that causes the field to not be parsed correctly. For example, if the attribute has a period in the name, an auto-generated RSD from the Query Profile Wizard might generate an RSD that contains this definition. As shown in the example below, the attribute is "rating.2" and the value for this attribute will not be parsed correctly when defined as this:
<attr name="rating2" xs:type="integer" readonly="false" other:xPath="/json/reviews/rating.2"/>
<attr name="rating2" xs:type="integer" readonly="false" other:xPath="/json/reviews/[rating.2]" />
Worked example
This example RSD file shows what could be included in an RSD file and gives an idea of what is possible. As all API endpoints differ, not every eventuality has been covered in this section.
Line 5 to 12
Details of the location of the data in the JSON file which provide the output columns in the table to be created. These give the column names, location of the data and data type. Whether the column is a primary key or required can also be specified here:
<!-- Output columns -->
<attr name="id" xs:type="integer" other:xPath="internal_id" />
<attr name="record_type" xs:type="string" other:xPath="record_type" />
<attr name="name" xs:type="string" other:xPath="record_name" />
<attr name="value" xs:type="double" other:xPath="value" />
<attr name="date" xs:type="datetime" other:xPath="record_date" />
<attr name="address_line1" xs:type="string" other:xPath="addresses/address_line1" />
Line 14 to 20
Additional columns which are not necessarily required in the output table. The first of these is to support an input parameter, while the other is a dummy column used for paging the data from the API:
<!-- Add the other columns here -->
<!-- Dummy column for where clause -->
<input name="fromdate" xs:type="datetime" />
<!-- Paging meta column -->
<input name="Rows@Next" desc="Identifier for the next page of results"/>
Line 24 to Line 26
The URI for the request. In this example, the URI has an input parameter for fromdate
and a fixed todate
:
<!-- Set the RESTlet URI -->
<api:set attr="uri" value="https://api.dummyexample.com/fromdate=[_input.fromdate]&todate=2020-01-01" />
<api:set attr="uri" value="testdata.json" />
A test file can be created by adding a new table into the API Profile by clicking on the + on the Configure API Profiles windows. This sample JSON code can then be copied into the dialog box provided.
Line 28 and 30
The repeat elements and sub repeat elements define the granularity of the data to be accessed as well as the path of the data. A sub repeat element is required if some data is at a lower level than other required data. For example, the data may contain details of customers and each customer may have more than one address. The sub repeat element would be defined at the address level to pull through all addresses:
<!-- Set the Repeat Element -->
<api:set attr="RepeatElement" value="/feed/records" />
<api:set attr="SubRepeatElement" value="/feed/records/addresses"/>
Only one sub repeat element is allowed per RSD file. In the example above, if there was also a credit card record for customers and each customer could have more than one credit card then multiple passes will be required.
Line 32 and 33
The content-type header telling Matillion ETL that the API is a JSON API:
<!-- Set the 'content-type' header -->
<api:set attr="ContentType" value="application/json" />
Line 35 to 37
The HTTP request Headers. In this example the Authorisation is being passed through the header with a connection parameter called auth. Other Header values can be set in here if required. For further details on Authenticating API profiles please see here.
<!-- Set the 'Authorization' header -->
<api:set attr="Header:Name#" value="Authorization" />
<api:set attr="Header:Value#" value="[_connection.auth]" />
Line 39
Description of the REST Operation—in this case an HTTP GET request:
<api:script method="GET" >
Line 41 to 58
Dealing with paging, an HTTP POST request is made to request the next rows of data:
<!-- Request an HTTP POST -->
<api:set attr=method value="POST"/>
<!-- Set the JSON POST body to the required ranges -->
<api:check attr="Rows@Next">
<api:set item="userns" attr="startpoint" value="[_input.Rows@Next]" />
<api:set item="userns" attr="endpoint" value="[userns.startpoint | add(5000)]" />
<api:else>
<api:set item="userns" attr="startpoint" value="0" />
<api:set item="userns" attr="endpoint" value="5000" />
</api:else>
</api:check>
<api:set attr="data">{
"min":[userns.startpoint],
"max":[userns.endpoint],
"searchID":"customsearch_ppj_generic_transaction_ss"
}</api:set>
Line 60
API paging can be enabled and managed at the Response Configuration window of the Manage Query profile interface. See API Query Profiles for this process or Manage Query Profiles for more general information.
If paging feature is enabled, the line below will be added in the RSD generated file.
<api:set attr="EnablePaging" value="TRUE" />
Line 62
Call the API:
<api:call op="jsonproviderGet">
Line 64 to 68
Get the next rows of data:
<!-- Increment the total row counter -->
<api:set item="userns" attr="startpoint" value="[userns.startpoint | add(1)]"/>
<!-- Set Rows@Next to invoke paging -->
<api:set attr="Rows@Next" value="[userns.startpoint]" />
Example Matillion ETL RSD
The below is an example RSD for getting the task history from a Matillion ETL instance.
- Note that different columns have different xs:type types and are mapped to other:xPath elements of the same name as the column.
- The instance address uses a parameter for the instance address as denoted by
[_connection.InstanceAddress]
- Note that authorization details are not recorded in the RSD but are recorded as connection options. These can be set on the API Query component.
<api:script xmlns:api="http://apiscript.com/ns?v1" xmlns:xs="http://www.w3.org/2001/XMLSchema">
<!-- See Column Definitions to specify column behavior and use XPaths to extract column values from JSON. -->
<api:info title="EndpointExample" desc="Generated schema file." xmlns:other="http://apiscript.com/ns?v1">
<!-- You can modify the name, type, and column size here. -->
<attr name="customerID" xs:type="integer" readonly="false" other:xPath="customerID" />
<attr name="endTime" xs:type="long" readonly="false" other:xPath="endTime" />
<attr name="enqueuedTime" xs:type="long" readonly="false" other:xPath="enqueuedTime" />
<attr name="environmentID" xs:type="integer" readonly="false" other:xPath="environmentID" />
<attr name="environmentName" xs:type="string" readonly="false" other:xPath="environmentName" />
<attr name="groupName" xs:type="string" readonly="false" other:xPath="groupName" />
<attr name="hasHistoricJobs" xs:type="boolean" readonly="false" other:xPath="hasHistoricJobs" />
<attr name="id" xs:type="integer" readonly="false" other:xPath="id" />
<attr name="jobID" xs:type="integer" readonly="false" other:xPath="jobID" />
<attr name="jobName" xs:type="string" readonly="false" other:xPath="jobName" />
<attr name="jobNames" xs:type="string" readonly="false" other:xPath="jobNames" />
<attr name="message" xs:type="unknown" readonly="false" other:xPath="message" />
<attr name="originatorID" xs:type="string" readonly="false" other:xPath="originatorID" />
<attr name="projectID" xs:type="integer" readonly="false" other:xPath="projectID" />
<attr name="projectName" xs:type="string" readonly="false" other:xPath="projectName" />
<attr name="rowCount" xs:type="integer" readonly="false" other:xPath="rowCount" />
<attr name="startTime" xs:type="long" readonly="false" other:xPath="startTime" />
<attr name="state" xs:type="string" readonly="false" other:xPath="state" />
<attr name="tasks_componentID" xs:type="integer" readonly="false" other:xPath="tasks/componentID" />
<attr name="tasks_componentName" xs:type="string" readonly="false" other:xPath="tasks/componentName" />
<attr name="tasks_endTime" xs:type="long" readonly="false" other:xPath="tasks/endTime" />
<attr name="tasks_jobID" xs:type="integer" readonly="false" other:xPath="tasks/jobID" />
<attr name="tasks_jobName" xs:type="string" readonly="false" other:xPath="tasks/jobName" />
<attr name="tasks_jobRevision" xs:type="integer" readonly="false" other:xPath="tasks/jobRevision" />
<attr name="tasks_jobTimestamp" xs:type="long" readonly="false" other:xPath="tasks/jobTimestamp" />
<attr name="tasks_message" xs:type="string" readonly="false" other:xPath="tasks/message" />
<attr name="tasks_parentID" xs:type="integer" readonly="false" other:xPath="tasks/parentID" />
<attr name="tasks_rowCount" xs:type="integer" readonly="false" other:xPath="tasks/rowCount" />
<attr name="tasks_startTime" xs:type="long" readonly="false" other:xPath="tasks/startTime" />
<attr name="tasks_state" xs:type="string" readonly="false" other:xPath="tasks/state" />
<attr name="tasks_taskBatchID" xs:type="integer" readonly="false" other:xPath="tasks/taskBatchID" />
<attr name="tasks_taskID" xs:type="integer" readonly="false" other:xPath="tasks/taskID" />
<attr name="tasks_type" xs:type="string" readonly="false" other:xPath="tasks/type" />
<attr name="type" xs:type="string" readonly="false" other:xPath="type" />
<attr name="versionID" xs:type="integer" readonly="false" other:xPath="versionID" />
<attr name="versionName" xs:type="string" readonly="false" other:xPath="versionName" />
</api:info>
<api:set attr="BackwardsCompatibilityMode" value="true" />
<api:set attr="uri" value="http://[_connection.InstanceAddress]/rest/v1/groups/name/awssf/project/name/awssf/task/history" />
<!-- Column XPaths are relative to a RepeatElement that splits the JSON into rows. -->
<api:set attr="RepeatElement" value="/" />
<!-- The GET method corresponds to SELECT. Here you can override the default processing of the SELECT statement. The results of processing are pushed to the schema's output. See SELECT Execution for more information. -->
<api:script method="GET">
<api:set attr="method" value="GET"/> <!-- HTTP request type -->
<api:set attr="method" value="GET"/>
<api:call op="jsonproviderGet">
<api:push/>
</api:call>
</api:script>
</api:script>