Document AI Predict
Public preview
Editions
Production use of this feature is available for specific editions only. Contact our sales team for more information.
The Document AI Predict transformation component extracts data from documents. It invokes the Snowflake document predict function and allows users to call their Document AI models from the context of a Data Productivity Cloud pipeline.
Prerequisites
- Files you wish to process must be in your Snowflake stage.
- You must have created and configured a Document AI build model in Snowflake already. For more information, read Set up the required objects and privileges.
- The Document AI Predict component requires that your file's relative path and presigned URL be available in a Snowflake table. Use this example query to populate your table:
select
relative_path,
GET_PRESIGNED_URL(@<stage_name>, relative_path) presigned_url
from directory(@<stage_name>);
Note
The final step in the Snowflake tutorial, Create a document processing pipeline, isn't required in the Data Productivity Cloud.
Properties
Database
= drop-down
The Snowflake source database. The special value, [Environment Default], will use the database defined in the environment. For more information, read Databases, Tables and Views - Overview.
Schema
= drop-down
The Snowflake source schema. The special value, [Environment Default], will use the schema defined in the environment. For more information, read Database, Schema, and Share DDL.
Model Build Name
= string
The build name of the Document AI model. Read Prepare a Document AI model build to learn more.
Model Build Version
= string
Optionally specify the version of the model to use. If not set, this parameter will default to the latest version.
URL Column
= drop-down
The source column containing the presigned URLs of the staged files the model should act on.
Presigned URLs let the user bypass the authentication and sign-in process.
As part of the Document AI API, provide a presigned URL to the document you want to run the model against. Document AI then uses the presigned URL to fetch the intended document.
Include Input Columns
= Boolean
- Yes: Outputs both your source URL columns and the prediction columns. This is the default setting.
- No: Only includes the new prediction columns.
Snowflake | Databricks | Amazon Redshift |
---|---|---|
✅ | ❌ | ❌ |