Sampling output
Many components let you sample the data output by that component. Sampling the data lets you see the data available once the component has performed its designated tasks. Sampling also confirms that the component is set up correctly before using it in a live pipeline.
Prerequisites
Before you can sample a component's output, you must first validate the pipeline. Read The pipeline canvas for more information.
Sampling data
To sample, select the component on the canvas and click the Sample data tab, then click the Sample data button on that tab. The tab will display a table of data that the component would return when the pipeline runs, up to a maximum of 1000 rows.
Click the Expand and Collapse icons in the top-right of the Sample data tab to adjust the size of the tab.
Note
Alternatively, click the Sample data button in the Properties panel to sample the data for the selected component.
Limit sample
The sample limit drop-down menu indicates how many rows will be returned. You can use this menu to limit your sample to as few as 1 row, or as many as 1000 rows, with many options in-between, such as 10, 25, and 250. The default is 25 rows.
Resize columns
To resize the columns of the sample table, hover over a column edge and when you see the resize handle, click and drag it to increase or decrease the column width.
Refresh sample
To refresh the sampled data, click the Refresh sample icon in the top right of the Sample data tab.
Export sample
To download the sampled data as a CSV file, click the Download CSV icon in the top right of the Sample data tab.
Metadata
To view the metadata about the output of a component, select the component on the canvas and click the Metadata tab at the bottom of the canvas. The Metadata tab shows the name, data type, and size of each column in the component's output.
Use the toggle to switch to text mode. You can use the metadata information shown in text mode as the basis for creating a table using the Create Table component.
Filtering sampled data
You can filter the rows displayed in the table in the Sample data tab using a filter query. Matillion sends this query to your warehouse as an SQL WHERE
clause, and returns data that matches the conditions you have specified in your query.
To filter the data shown in the Sample data table:
- Click the Sample data tab.
- Click Apply filters.
- Enter your filter query in the text field.
- Click the Sample data button or press Enter.
When writing your filter query, follow these rules:
- For Snowflake, enter column names either in UPPERCASE, e.g.
ORDER_QUANTITY
, or surrounded by double quotes, e.g."order_quantity"
. - For Amazon Redshift and Databricks, enter column names without any quote marks, e.g.
order_quantity
. - Use single quotes around string values.
- Do not use quotes around number values.
- Enter date values in the format
YYYY-MM-DD
surrounded by single quotes, e.g.'2024-12-31'
.
Filter query examples
The following examples show how you can create filter queries for different value types and combine query clauses. These filters are written for Snowflake, which is why the column names are in double quotes.
- Filter for string values:
"customer_surname" = 'Smith'
will display rows where the customer's surname is "Smith". - Filter for number values:
"order_quantity" > 5
will display rows where the order quantity is greater than 5. - Filter for date values:
"order_date" < '2025-01-01'
will display rows where the order date is before January 1, 2025.
You can combine query clauses using AND
and OR
, and search for the opposite of a condition using NOT
.
- Using
AND
:"customer_organization" = 'Matillion' AND "order_date" > '2025-04-01'
will display rows for all orders placed by Matillion after April 1, 2025. The data displayed must meet both of these conditions. - Using
OR
:"country" = 'UK' OR "customer_organization" = 'Matillion'
will display rows for all orders placed in the UK, regardless of the customer's organization, and all orders placed by Matillion, regardless of the country. The data displayed only needs to meet one of the conditions. - Using
NOT
:NOT "country" = 'UK'
will display rows for all orders placed in a country other than the UK.
Operators
You can use the following operators in your query:
-
Comparison operators
=
: equals!=
or<>
: does not equal<
and>
: less than, greater than<=
and>=
: less than or equal to, greater than or equal toIS NULL
: is a null valueIS NOT NULL
: is a non-null value
-
Logical operators
AND
: filter results that meet more than one conditionOR
: filter results that meet at least one of a number of conditionsNOT
: filter results that meet the opposite of a condition
-
Set operators
IN
: matches a value within a list or subqueryNOT IN
: does not match a value within a list or subqueryBETWEEN
: is within a specified rangeNOT BETWEEN
: is not within a specified rangeLIKE
: matches a pattern (case-sensitive)NOT LIKE
: does not match a pattern (case-sensitive)ILIKE
: matches a pattern (case-insensitive)
-
Other operators:
||
: string concatenation
Warehouse-specific operators
Additionally, there are some operators available for use with each cloud data warehouse.
Snowflake
-
Comparison and logical operators
<=>
: NULL-safe equals (useful for comparingNULL
values)RLIKE
: matches a regular expressionNOT RLIKE
: does not match a regular expressionNOT ILIKE
: does not match a pattern (case-insensitive)
-
Bitwise operators
&
: Bitwise AND|
: Bitwise OR^
: Bitwise XOR~
: Bitwise NOT
Amazon Redshift
-
Comparison and logical operators
SIMILAR TO
: matches a specified pattern with SQL regular expressionsNOT SIMILAR TO
: does not match a specified pattern with SQL regular expressions
-
Arithmetic operators
%
: modulo
Databricks
-
Comparison and logical operators
RLIKE
: matches a regular expressionNOT RLIKE
: does not match a regular expressionDIV
: integer division
-
Arithmetic operators
MOD
or%
: modulo
Enabling and disabling sampling for a project
If necessary, you can enable and disable sampling at the project level. This is useful if your data contains personal information that cannot be viewed outside your region. Only users with project admin permissions can change this setting.
To enable or disable sampling for a project:
- In the Your projects page, click the three dots ... next to the intended project.
- Click Enable sampling or Disable sampling as required.
Got feedback or spotted something we can improve?
We'd love to hear from you. Join the conversation in the Documentation forum!