Skip to content

Sampling output

Many components let you sample the data output by that component. Sampling the data lets you see the data available once the component has performed its designated tasks. Sampling also confirms that the component is set up correctly before using it in a live pipeline.

Prerequisites

Before you can sample a component's output, you must first validate the pipeline. Read The pipeline canvas for more information.


Sampling data

To sample, select the component on the canvas and click the Sample data tab, then click the Sample data button on that tab. The tab will display a table of data that the component would return when the pipeline runs, up to a maximum of 1000 rows.

Click the Expand and Collapse icons in the top-right of the Sample data tab to adjust the size of the tab.

Note

Alternatively, click the Sample data button in the Properties panel to sample the data for the selected component.

Limit sample

The sample limit drop-down menu indicates how many rows will be returned. You can use this menu to limit your sample to as few as 1 row, or as many as 1000 rows, with many options in-between, such as 10, 25, and 250. The default is 25 rows.

Resize columns

To resize the columns of the sample table, hover over a column edge and when you see the resize handle, click and drag it to increase or decrease the column width.

Refresh sample

To refresh the sampled data, click the Refresh sample icon in the top right of the Sample data tab.

Export sample

To download the sampled data as a CSV file, click the Download CSV icon in the top right of the Sample data tab.


Metadata

To view the metadata about the output of a component, select the component on the canvas and click the Metadata tab at the bottom of the canvas. The Metadata tab shows the name, data type, and size of each column in the component's output.

Use the toggle to switch to text mode. You can use the metadata information shown in text mode as the basis for creating a table using the Create Table component.


Filtering sampled data

You can filter the rows displayed in the table in the Sample data tab using a filter query. Matillion sends this query to your warehouse as an SQL WHERE clause, and returns data that matches the conditions you have specified in your query.

To filter the data shown in the Sample data table:

  1. Click the Sample data tab.
  2. Click Apply filters.
  3. Enter your filter query in the text field.
  4. Click the Sample data button or press Enter.

When writing your filter query, follow these rules:

  • For Snowflake, enter column names either in UPPERCASE, e.g. ORDER_QUANTITY, or surrounded by double quotes, e.g. "order_quantity".
  • For Amazon Redshift and Databricks, enter column names without any quote marks, e.g. order_quantity.
  • Use single quotes around string values.
  • Do not use quotes around number values.
  • Enter date values in the format YYYY-MM-DD surrounded by single quotes, e.g. '2024-12-31'.

Filter query examples

The following examples show how you can create filter queries for different value types and combine query clauses. These filters are written for Snowflake, which is why the column names are in double quotes.

  • Filter for string values: "customer_surname" = 'Smith' will display rows where the customer's surname is "Smith".
  • Filter for number values: "order_quantity" > 5 will display rows where the order quantity is greater than 5.
  • Filter for date values: "order_date" < '2025-01-01' will display rows where the order date is before January 1, 2025.

You can combine query clauses using AND and OR, and search for the opposite of a condition using NOT.

  • Using AND: "customer_organization" = 'Matillion' AND "order_date" > '2025-04-01' will display rows for all orders placed by Matillion after April 1, 2025. The data displayed must meet both of these conditions.
  • Using OR: "country" = 'UK' OR "customer_organization" = 'Matillion' will display rows for all orders placed in the UK, regardless of the customer's organization, and all orders placed by Matillion, regardless of the country. The data displayed only needs to meet one of the conditions.
  • Using NOT: NOT "country" = 'UK' will display rows for all orders placed in a country other than the UK.

Operators

You can use the following operators in your query:

  • Comparison operators

    • =: equals
    • != or <>: does not equal
    • < and >: less than, greater than
    • <= and >=: less than or equal to, greater than or equal to
    • IS NULL: is a null value
    • IS NOT NULL: is a non-null value
  • Logical operators

    • AND: filter results that meet more than one condition
    • OR: filter results that meet at least one of a number of conditions
    • NOT: filter results that meet the opposite of a condition
  • Set operators

    • IN: matches a value within a list or subquery
    • NOT IN: does not match a value within a list or subquery
    • BETWEEN: is within a specified range
    • NOT BETWEEN: is not within a specified range
    • LIKE: matches a pattern (case-sensitive)
    • NOT LIKE: does not match a pattern (case-sensitive)
    • ILIKE: matches a pattern (case-insensitive)
  • Other operators:

    • ||: string concatenation

Warehouse-specific operators

Additionally, there are some operators available for use with each cloud data warehouse.

Snowflake

  • Comparison and logical operators

    • <=>: NULL-safe equals (useful for comparing NULL values)
    • RLIKE: matches a regular expression
    • NOT RLIKE: does not match a regular expression
    • NOT ILIKE: does not match a pattern (case-insensitive)
  • Bitwise operators

    • &: Bitwise AND
    • |: Bitwise OR
    • ^: Bitwise XOR
    • ~: Bitwise NOT

Amazon Redshift

  • Comparison and logical operators

    • SIMILAR TO: matches a specified pattern with SQL regular expressions
    • NOT SIMILAR TO: does not match a specified pattern with SQL regular expressions
  • Arithmetic operators

    • %: modulo

Databricks

  • Comparison and logical operators

    • RLIKE: matches a regular expression
    • NOT RLIKE: does not match a regular expression
    • DIV: integer division
  • Arithmetic operators

    • MOD or %: modulo

Enabling and disabling sampling for a project

If necessary, you can enable and disable sampling at the project level. This is useful if your data contains personal information that cannot be viewed outside your region. Only users with project admin permissions can change this setting.

To enable or disable sampling for a project:

  1. In the Your projects page, click the three dots ... next to the intended project.
  2. Click Enable sampling or Disable sampling as required.

Got feedback or spotted something we can improve?

We'd love to hear from you. Join the conversation in the Documentation forum!