Skip to content

Instagram Extract

Overview

The Instagram Extract component calls the Instagram API to retrieve and store data to be either referenced by an external table or loaded into a table, depending on your cloud data warehouse. You can then use transformation components to enrich and manage the data in permanent tables.

Using this component may return structured data that requires flattening. For help with flattening such data, we recommend using the following components:


Properties

Name = string

A human-readable name for the component.


Data Source = drop-down

Select the Instagram data source from the available options.


OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, please read our Instagram Extract authentication guide.


Page Limit = integer

Integer value for the limit of pages to stage.


Location = filepath

Provide an Amazon S3 bucket path, Google Cloud Storage (GCS) bucket path, or Azure Blob Storage path that will be used to store the data. The data can then be referenced by an external table. A folder will be created at this location with the same name as the target table.


Integration = drop-down

(GCP only) Choose your Google Cloud Storage Integration. Integrations are required to permit Snowflake to read data from and write to a Google Cloud Storage bucket. Integrations must be set up in advance of selecting them in Matillion ETL. To learn more about setting up a storage integration, read our Storage Integration setup guide.


Warehouse = drop-down

The Snowflake warehouse used to run the queries. The special value, [Environment Default], will use the warehouse defined in the environment. Read Overview of Warehouses to learn more.


Database = drop-down

The Snowflake database. The special value, [Environment Default], will use the database defined in the environment. Read Databases, Tables and Views - Overview to learn more.


Schema = drop-down

The Snowflake schema. The special value, [Environment Default], will use the schema defined in the environment. Read Database, Schema, and Share DDL to learn more.


Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.

Name = string

A human-readable name for the component.


Data Source = drop-down

Select the Instagram data source from the available options.


OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, please read our Instagram Extract authentication guide.


Page Limit = integer

Integer value for the limit of pages to stage.


Location = filepath

Provide an Amazon S3 bucket path that will be used to store the data. The data can then be referenced by an external table. A folder will be created at this location with the same name as the target table.


Type = drop-down

  • External: The data will be put into your chosen S3 bucket and referenced by an external table.
  • Standard: The data will be staged on your chosen S3 bucket before being loaded into a table. This is the default setting.

Standard Schema = drop-down

The Redshift schema. The special value, [Environment Default], will use the schema defined in the Matillion ETL environment.


External Schema = drop-down

The table's external schema. Read Getting Started with Amazon Redshift Spectrum to learn more.


Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.

Name = string

A human-readable name for the component.


Data Source = drop-down

Select the Instagram data source from the available options.


OAuth = drop-down

Select an OAuth entry to authenticate this component. An OAuth entry must be set up in advance. To learn how to create and authorize an OAuth entry, please read our Instagram Extract authentication guide.


Page Limit = integer

Integer value for the limit of pages to stage.


Table Type = drop-down

Select whether the table is Native (by default in BigQuery) or an external table.


Project = drop-down

Select the Google Cloud project. The special value, [Environment Default], will use the project defined in the environment. For more information, read Creating and managing projects.


Dataset = drop-down

Select the Google BigQuery dataset to load data into. The special value, [Environment Default], will use the dataset defined in the environment. For more information, read Introduction to datasets.


Target Table = string

A name for the new table. Upon running the job, this table will be recreated and will drop any existing table of the same name.


New Target Table = string

A name for the new external table. Only available when the table type is External.


Cloud Storage Staging Area = Google Cloud Storage bucket

The URL and path of the target Google Cloud Storage bucket to be used for staging the queried data. Only available when the table type is Native.


Location = Google Cloud Storage bucket

The URL and path of the target Google Cloud Storage bucket. Only available when the table type is External.


Load Options = multiple drop-downs

  • Clean Cloud Storage Files: Destroy staged files on Google Cloud Storage after loading data. Default is On.
  • Cloud Storage File Prefix: Give staged file names a prefix of your choice. The default setting is an empty field.
  • Recreate Target Table: Choose whether the component recreates its target table before the data load. If Off, the component will use an existing table or create one if it does not exist. Default is On.
  • Use Grid Variable: Check this checkbox to use a grid variable. This box is unchecked by default.

Data source properties

See below for properties relevant to specific data sources.

Account insights

Instagram Business Account Id = string

Your Instagram Business Account Id.


Since = string

Provide a "Since" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Until = string

Provide an "Until" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Period = drop-down

Specify an aggregation period. Select from "day", "day_28", "lifetime", or "week".


Audience insights

Instagram Business Account Id = string

Your Instagram Business Account Id.


Media Id = string

The unique identifier for the media object.


Comments

Media Id = string

The unique identifier for the media object.


User Id = string

The unique identifier for the user.


Hashtag Id = string

The unique identifier for the hashtag.


User Id = string

The unique identifier for the user.


Hashtag Query = string

Provide a hashtag string. The hashtag symbol # is NOT required. Users should query one hashtag at a time. Users can query a maximum of 30 unique hashtags within a 7-day period.


User Id = string

The unique identifier for the user.


Hashtag Id = string

The unique identifier for the hashtag.


Media

Instagram Business Account Id = string

Your Instagram Business Account Id.


Media insights

Since = string

Provide a "Since" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Until = string

Provide an "Until" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Media Id = string

The unique identifier for the media object.


Online followers

Instagram Business Account Id = string

Your Instagram Business Account Id.


Since = string

Provide a "Since" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Until = string

Provide an "Until" start time parameter. This value can be either a Unix timestamp or any English textual datetime description that can be parsed by the PHP function "strtotime".

The "Since" and "Until" parameters are inclusive, meaning that if a user's range includes a day that has not yet ended (i.e, today), subsequent queries throughout the day may return increased values. If users do not include the "Since" and "Until" parameters, the API will default to a 2-day range: yesterday through today.


Replies

Comment Id = string

The unique identifier for the comment.


Stories

Instagram Business Account Id = string

Your Instagram Business Account Id.


Story insights

Media Id = string

The unique identifier for the media object.


Video insights

Media Id = string

The unique identifier for the media object.


Snowflake Delta Lake on Databricks Amazon Redshift Google BigQuery Azure Synapse Analytics