Skip to content

Document AI Predict

Public preview

The Document AI Predict transformation component extracts data from documents. It invokes the Snowflake document predict function and allows users to call their Document AI models from the context of a Data Productivity Cloud pipeline.


Prerequisites

  • Files you wish to process must be in your Snowflake stage.
  • You must have created and configured a Document AI build model in Snowflake already. For more information, read Set up the required objects and privileges.
  • The Document AI Predict component requires that your file's relative path and presigned URL be available in a Snowflake table. Use this example query to populate your table:
select
    relative_path,
    GET_PRESIGNED_URL(@<stage_name>, relative_path) presigned_url
from directory(@<stage_name>);

Note

The final step in the Snowflake tutorial, Create a document processing pipeline, isn't required in the Data Productivity Cloud.


Properties

Database = drop-down

The Snowflake source database. The special value, [Environment Default], will use the database defined in the environment. For more information, read Databases, Tables and Views - Overview.


Schema = drop-down

The Snowflake source schema. The special value, [Environment Default], will use the schema defined in the environment. For more information, read Database, Schema, and Share DDL.


Model Build Name = string

The build name of the Document AI model. Read Prepare a Document AI model build to learn more.


Model Build Version = string

Optionally specify the version of the model to use. If not set, this parameter will default to the latest version.


URL Column = drop-down

The source column containing the presigned URLs of the staged files the model should act on.

Presigned URLs let the user bypass the authentication and sign-in process.

As part of the Document AI API, provide a presigned URL to the document you want to run the model against. Document AI then uses the presigned URL to fetch the intended document.


Include Input Columns = Boolean

  • Yes: Outputs both your source URL columns and the prediction columns. This is the default setting.
  • No: Only includes the new prediction columns.

Snowflake Databricks Amazon Redshift