Skip to content

API Extract Profiles

Overview

API Extract profiles are collections of endpoint configurations that define the output from a JSON or XML based API. The raw data is staged before being pulled into your target data warehouse and flattened into table data according to how you defined the API Extract Profile.

API Extract profiles can be added and configured in the project menu: Project → Manage API Profiles → Manage Extract Profiles. This dialog and its functions are explained in Manage Extract Profiles and should be understood before continuing.

API Extract profiles can be used in the API Extract component by selecting the Profile property. Users will then select the desired endpoint to be used from this profile.


Manage and Configure Extract profiles

The Manage Extract Profiles dialog exists to organize your API Extract Profiles.

To set up a new profile use the + button, or the - button after selecting a profile to remove it. Existing profiles can be edited by clicking the cogwheel icon next to the profile you want.

Configuring a profile will take you to the Configure Extract Profile screen. On the left, endpoints belonging to this profile are listed and selecting one will give a summary of its configuration on the right.

To add a new endpoint to the profile, click the New Endpoint button. To configure or delete an existing profile, click the cogwheel or X icon, respectively, next to the endpoint.

Unlike API Query Profiles, API Extract Profiles do not require custom code from the user and can be configured entirely through the wizard, the stages of which are detailed in the below sections.


Wizard - Source Details

This section of the wizard is simply for labelling your new API Extract Profile. The name and description are how Matillion ETL will refer to this entry and are arbitrary.

  • Name: A name for your new API Extract Profile endpoint.
  • Description: A description for your new endpoint.

Wizard - Endpoint Configuration

The Endpoint Configuration section takes details about your API request. Different requests will require a different amount of settings before a successful response can be returned.

Clicking Send will attempt the request as defined.

  • If the request is successful, the returned response will be recorded in the Response tab.
  • All responses (successful or otherwise) have their status code and response headers logged in the Log tab.

Endpoint URI

Select the response type for your endpoint (GET or POST) and then enter the endpoint URI into the adjacent field. This is your data source.

Auth

  • Authorization information is NOT stored in API Profiles for security reasons. This information is required to configure the wizard but is lost afterward.
  • Reconfiguring an endpoint will require reentering your authorization details.
  • The authorization type is remembered and will be used by the API Extract component to prompt the user for their authorization details.

Many endpoints will require you to provide some authorization details to accompany your request.

If authorization is required, begin by clicking the toggle so it is green and Enabled. Then, configure your authorization according to the type required. This type depends entirely on the endpoint and it is up to the user to have authorization details at hand.

  • Basic Auth
    • User: A valid username to authorize with the endpoint.
    • Password: A valid password for the above username.
  • Bearer Token
    • Token: The bearer token string. This token will need to be acquired independently by the user. Most bearer tokens will expire after some time and need to be refreshed by editing this API Profile.
  • API Key Value Pair
    • Key: The key used for authorization. This will need to be acquired independently by the user.
    • Value: The corresponding value for the above key. This will need to be acquired independently by the user.
    • Parameter Type: Specifies whether the Key Value Pair is included in the Header of the request or as Query parameters in the URI of the request.
  • OAuth
    • Authentication: Select the entry for your stored OAuth. OAuths can be stored as a resource in Matillion ETL and usually require independent configuration by the user with their third-party service.

Params

Note

Parameters are defined here only so they can be recognised and used by the API Extract component, which allows users to define values for their parameters on the component itself. Therefore, any values given to these parameters are for testing purposes and will not be automatically used in loading data. The exception to this are "Constant" parameters.

Parameters can be added to your request by clicking the + button, or removed by selecting a parameter and clicking the - button.

Parameters in API requests are usually used to specify a particular response, such as specifying paths or querying data before it is returned. The user should be familiar with their intended data source so as to identify which, if any, parameters are required for their request.

  • Parameter Name: The name of the parameter. This is not arbitrary and relates exactly to a parameter expected by the source system.
  • Value: The value this parameter will be set to.
  • Type:
    • Query: Parameters put into the request URI after a ? symbol. These usually define a subset of data to be returned.
    • URI: Otherwise known as a "path" parameter. These rely directly on your given endpoint URI and can be used to replace parts of it. Any part of the endpoint URI given in curly brackets {}, can be replaced by a URI parameter.
      • For example. Specifying a URI parameter myInstance as 127.0.0.1 can be used with the endpoint URI http://{myInstance}/rest/v1/ so the request URI becomes http://127.0.0.1/rest/v1/
    • Header: Parameters included in the request header, often for the purpose of authorization.
  • Constant: Specify whether this parameter is constant or not. Constant parameters retain the value they are set with in this configuration wizard and cannot be changed later in the API Extract component properties.

For detailed information on parameters with examples, see API Profiles - Parameters.

Body

The body is defined here for testing purposes and will not be automatically used in loading data. The API Extract component properties allows users to specify the body they wish to use.

In this field you can enter a string as the body of the request. Although a body can be included with GET requests, it will not be used by the source system. As such, it is only important to include a body with POST type requests.

Response

Your request can be sent using the Send button in the wizard. This will use your endpoint, auth, params, and body configuration as appropriate. If the request is successful, a response should be returned in this tab. The exception to this is in cases of a 204 No Content status code as explained in the Log section.

The Endpoint Configuration section of the wizard is designed to allow users to keep editing and retrying their configuration until they are satisfied with the response. Many configuration details are expressly for the purpose of returning a response that can be later used in the Response Configuration section of the wizard, and not for saving those details for later use. For example, users might enter authentication details or parameter values but these should be reentered when using the API Profile in its appropriate component and are often not stored after the wizard is completed.

Log

Regardless of if your request succeeds or fails, Matillion ETL will display information about the response in this tab. This data includes the final request URI and headers, including all parameters.

The range of possible response codes (and exactly how they behave) depends on the API being connected to but below are the most common types to find in API Profiles:

CODE STATUS DESCRIPTION
200 OK A successful request that has returned a response.
204 No Content A successful request that did not return a response body by design. Further details in the log may reveal information about the response, despite the lack of a response body.
400 Bad Request The request was invalid (incomplete or malformed) and could not be handled by the server.
401 Unauthorized Credentials are required for this endpoint and were either incorrect or missing from the request. See Authorization above.
403 Forbidden A generic response for the server refusing a request, even though the request may be well formed.
404 Not Found The requested resource could not be found by the server. This usually means the endpoint is incorrect.
429 Too Many Requests The rate limit has been reached and the server is refusing new requests. This condition is usually lifted after a short time but is likely to be quickly reached again unless changes to request or job can manage the request rate appropriately.
503 Service Unavailable The server has acknowledged the request but cannot currently handle it. This is usually a temporary issue and should be investigated with the server or host service.

Wizard - Response Configuration

If your request was configured correctly, the response data structure will now have been parsed by Matillion ETL and a suggested structure is returned. The user can expand and collapse the structure tree to explore the response.

Select fields

Users can specify a single Repeating Element in the response tree by right-clicking an element and setting it. This will prompt Matillion ETL to parse the JSON as though every occurence of that element is a single row in the target table. Elements outside of the selected repeating element are discarded.

Paging strategies

Choose and configure a pagination strategy, if one is required. For more information, see API Profile Pagination.

  • Full Path: Used when the API response in each page includes the instructions for fetching the "next" page.
    • Next page URI: The field from the tree used to indicate the next page's URI. This can be selected by right-clicking a field on the tree.
    • Page Size Parameter: The parameter used to indicate the page size. (Optional)
    • Page Size: Number of records to return per page. (Optional)
  • Relative Path: Used when the API response includes a relative path to the next page which can be suffixed to the base URI.
    • Base URI: The URI constant to exclude from the relative path.
    • Next page URI: Select the field from the tree (right click a field) used to indicate the next page URI.
    • Page Size Parameter: The parameter used to indicate the page size. (Optional)
    • Page Size: The number of records to return per page. (Optional)
  • Offset-based: The simplest form of paging where the API returns a set number of records at a time.
    • Limit: The number of records to return per page.
    • Record Count: Select a field from the tree (right click a field) used to indicate the count of total records.
  • Link Header: Where the API response includes a link header that gives the URL for the next (as well as previous, first and last) page.
    • Page Size Parameter: The parameter used to indicate the page size.
    • Page Size: The number of records to return per page.
  • Cursor-based: Used when the response contains an element that points to the next/previous page.
    • Cursor: Select the next cursor (right click a field) from the tree.
    • Cursor Parameter: The name of the cursor query parameter.
    • Page Size Parameter: The name of the page size query parameter. (Optional)
    • Page Size: The number of records to return per page. (Optional)
  • Page-based: Used when the endpoint accepts a page query param which is an integer indicating the page within the list to be returned.
    • Page Size Parameter: The parameter used to indicate the page size. (Optional)
    • Page Size: The number of records to return per page. (Optional)
    • Page Number Parameter: The name of the page number parameter.
    • Last Page: Select how the last page will be handled and the field from the tree containing the last page by clicking right on the tree view field. (Optional)

Wizard - Review

A summary of the decisions made can be found here. This same summary can be seen when selecting an endpoint on the Configure Extract Profile screen. Clicking Back will return you to previous configurations screens where adjustments can be made.


Using Your API Extract profile

Once your API Extract Profile has all the endpoints you wish to configure, you can leave the Manage Extract Profiles screen and return to the canvas.

Including the API Extract component on the canvas of an orchestration job will allow you to use your API Extract Profile in its Profile property. The user must define all authorization, parameters and body in this component. Values from the configuration wizard are not automatically used, although "Constant" parameters will be.