Two sets of actions are available in this plugin both in synchronous and asynchronous modes, one to extract text only, and one to analyze a document structure, such as forms, tables, along with extracting text.
0️⃣a : AUTOMATED CONFIGURATION FOR SYNC & ASYNC
=============================================
If you do not have AWS S3 configured yet, the configuration steps can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleS3¶m_BucketName=BucketNameOfYourChoice&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSS3Plugin.yaml You will find the required parameters values used to configure your AWS S3 plugin, for which "AWS S3 DROPZONE & SQS UTILITIES" is suggested, in the "OUTPUT" tab of the created stack.
The steps from 0) to 3) b) of START & GET EXTRACT TEXT (ASYNC) & START & GET ANALYZE DOCUMENT (ASYNC) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextract&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractAsync.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0️⃣b : AUTOMATED CONFIGURATION FOR SYNC ONLY
=============================================
The steps from 0) to 1) of EXTRACT TEXT (SYNC) & ANALYZE DOCUMENT (SYNC) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextractSyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractSyncOnly.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
1️⃣ : AWS TEXTRACT - OCR TEXT & DATA (FRONT-END)
===========================================
📋 ELEMENT DESCRIPTION
--------------------------------
AWS TEXTRACT - OCR TEXT & DATA element provides EXTRACT TEXT and ANALYZE DOCUMENT actions to extract text and analyze documents for relationships between detected text. The front-end element is suitable for applications when reactivity is desired, such as but not limited to, mobile applications.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 1) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextractSyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractSyncOnly.yamlYou will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS TEXTRACT by following this link:
https://console.aws.amazon.com/textract/home?p=txt&cp=bn&ad=c1) Create your AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET, then add to the credentials the AWS TEXTRACT READ-ONLY policy:
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keysIf you intend to use AWS S3 URI (s3://) with your own BUCKET, attach the AWS S3 READ ONLY policy to the considered BUCKET.
2) Register on plugins.wiseable.io. Create a new Credential which associates your BUBBLE APP URL, AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET.
The registration service will generate your PUBLIC ACCESS KEY. This key serves as a secure proxy for your real API key. It allows your application to communicate with the service without exposing your real API key. Since this PUBLIC ACCESS KEY is explicitly tied to your registered BUBBLE APP URL, it can only be used from that domain, ensuring that even if the key is publicly visible, it remains safe and cannot be misused by unauthorized sources.
3) In the Plugin Settings, enter the following:
- PUBLIC ACCESS KEY (generated from plugins.wiseable.io)
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
4) Add the AWS TEXTRACT - OCR TEXT & DATA (FRONT-END) element to the page. Select the RESULT DATA TYPE as "RESULT (TEXTRACT - OCR TEXT & DATA)".
5) Integrate the logic into your application using the following element's states and actions:
FIELDS:
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXTRACT - OCR TEXT & DATA)".
EVENTS:
- SUCCESS: Event triggered upon success
- ERROR: Event triggered upon error
EXPOSED STATES:
Use any element able to show/process the data of interest (such as a Group with a Text field) stored within the result of the following states:
- RESULTS: Populated upon SUCCESS event. Returns a list of Blocks. For each the text, words, lines, a bounding box of the element, confidence value, the polygon coordinates in which the text is contained, and relationships between the detected items.
- ERROR MESSAGE: Populated upon ERROR event.
- IS PROCESSING: Set to true when processing is in progress, false otherwise.
- REQUESTED ACTION: The latest requested action.
ELEMENT ACTIONS - TRIGGERED IN WORKFLOW:
- EXTRACT TEXT (SYNC) (FRONT-END): Extract text from a document.
Inputs Fields:
- IMAGE: Image from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), a HTTPS file URL (
https://server/file.ext) or a AWS S3 URI (s3://bucket/image.jpg). For both Protocol-relative and HTTPS URL, the file must be accessible through HTTPS Protocol.
- ANALYZE DOCUMENT (SYNC) (FRONT-END): Analyze document to extract text, forms, and tables.
Inputs Fields:
- IMAGE: Image from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), a HTTPS file URL (
https://server/file.ext) or a AWS S3 URI (s3://bucket/image.jpg). For both Protocol-relative and HTTPS URL, the file must be accessible through HTTPS Protocol.
- TABLES ANALYSIS: Set to yes to extract tables and the cells in a table. For example, when the following table is detected on a form, Amazon Textract detects a table with four cells.
- FORMS ANALYSIS: Set to yes to detect selection elements such as option buttons (radio buttons) and check boxes on a document page. Selection elements can be detected in form data and in tables.
2️⃣ : EXTRACT TEXT (SYNC) (BACK-END)
====================
📋 ACTION DESCRIPTION
--------------------------------
EXTRACT TEXT from a JPEG, PNG, TIFF image or PDF (single-page) file to return the text (words, lines), positions and relationships between the elements.
Operates in synchronous request mode, useful for small files and time-sensitive application.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 1) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextractSyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractSyncOnly.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS TEXTRACT:
https://console.aws.amazon.com/textract/home?p=txt&cp=bn&ad=c 1) Create your AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET, then add to the credentials the AWS TEXTRACT READ-ONLY policy:
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys If you intend to use AWS S3 URI (s3://) with your own BUCKET, attach the AWS S3 READ ONLY policy to the considered BUCKET.
2) In the Plugin Settings, enter the following:
- AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
3) Set up the action "EXTRACT TEXT (SYNC) (BACK-END)" in the workflow.
Inputs Fields:
- IMAGE: JPEG, PNG image or PDF (single-page) from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), a HTTPS file URL (
https://server/file.ext) or a AWS S3 URI (s3://bucket/image.jpg). For both Protocol-relative and HTTPS URL, the file must be accessible through HTTPS Protocol.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXTRACT - OCR TEXT & DATA)".
Output Fields:
- RESULTS: Returns a list of Blocks. For each the text, words, lines, confidence value, and relationships between the detected items.
3️⃣ : ANALYZE DOCUMENT (SYNC) (BACK-END)
==========================
📋 ACTION DESCRIPTION
--------------------------------
ANALYZE DOCUMENT in a JPEG, PNG, TIFF image or PDF (single-page) to return the structure (forms, tables), text and values (words, lines, selection elements), positions and relationships between the elements.
Operates in synchronous request mode, useful for small files and time-sensitive application.
🔧 STEP-BY-STEP SETUP
------------------------------
ℹ️ The steps from 0) to 1) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextractSyncOnly&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractSyncOnly.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS TEXTRACT:
https://console.aws.amazon.com/textract/home?p=txt&cp=bn&ad=c 1) Create your AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET, then add to the credentials the AWS TEXTRACT READ-ONLY policy:
https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys If you intend to use AWS S3 URI (s3://) with your own BUCKET, attach the AWS S3 READ ONLY policy to the considered BUCKET.
2) In the Plugin Settings, enter the following:
- AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
3) Set up the action "ANALYZE DOCUMENT (SYNC) (BACK-END)" in the workflow.
Inputs Fields:
- IMAGE URL: JPEG, PNG image or PDF (single-page) from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), a HTTPS file URL (
https://server/file.ext) or a AWS S3 URI (s3://bucket/image.jpg). For both Protocol-relative and HTTPS URL, the file must be accessible through HTTPS Protocol.
- TABLES ANALYSIS: Set to yes to extract tables and the cells in a table. For example, when the following table is detected on a form, Amazon Textract detects a table with four cells.
- FORMS ANALYSIS: Set to yes to detect selection elements such as option buttons (radio buttons) and check boxes on a document page. Selection elements can be detected in form data and in tables. For example, when the following table is detected on a form, Amazon Textract detects the check boxes in the table cells.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXTRACT - OCR TEXT & DATA)".
Output Fields:
- RESULTS: Returns a list of Blocks. For each the text, words, lines, tables, forms & cells values, confidence value and relationships between the detected items.
4️⃣ : START & GET EXTRACT TEXT (ASYNC)
================================
📋 ACTION DESCRIPTION
--------------------------------
EXTRACT TEXT from a JPEG, PNG, PDF file to return the text (words, lines), positions and relationships between the elements.
Asynchronous request mode, useful for large files and time-insensitive application.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 3) b) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextract&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractAsync.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS TEXTRACT:
https://console.aws.amazon.com/textract/home?p=txt&cp=bn&ad=c 1) Configure AWS TEXTRACT FOR ASYNCHRONOUS OPERATION by following ALL the instructions:
https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html Write down your:
- ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION
- NOTIFICATION ROLE ARN
- SNS TOPIC ARN
- QUEUE URL
2) In the Plugin Settings, enter the following:
- AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
3) Set-up in your workflow an action returning the BUCKET and KEY of your file to analyze.
a) If you do not already have such action, install the plugin "AWS S3 & SQS UTILITIES"
b) Create a AWS S3 BUCKET that will be used to store the file to analyze:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html c) Set up the "PUT FILE TO S3" action in the workflow.
Inputs Fields:
- FILE URL TO STORE: The file URL from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), or a HTTPS file URL (
https://server/file.ext). The file must be accessible through the HTTPS protocol.
- AWS S3 BUCKET NAME: AWS S3 Bucket Name to which the file will be saved.
- AWS S3 FILE NAME: Path & Name of the file to put to AWS S3. The format must be [path/]filename.ext.
Example 1: path1/path2/filename.ext.
Example 2: filename.ext if the file is at the root of the bucket.
4) Set up the "START EXTRACT TEXT JOB (ASYNC)" action in the workflow.
Inputs Fields:
- AWS S3 BUCKET NAME: AWS S3 bucket name from which the input file will be read.
- AWS S3 FILE NAME: Path & Name of the JPEG, PNG, PDF file to get from AWS S3. The format must be [path/]filename.ext.
Example 1: path1/path2/filename.ext.
Example 2: filename.ext if the file is at the root of the bucket.
- NOTIFICATION ROLE ARN: ARN of an IAM role giving AWS TEXTRACT publishing permissions to the AWS SNS topic.
- SNS TOPIC ARN: AWS SNS topic ARN to which AWS TEXTRACT posts the completion status.
Output Fields:
- JOBID: ID of the Job, to be reused in the "GET JOB STATUS FROM SQS" and "GET EXTRACT TEXT RESULTS (ASYNC)".
5) Install the plugin "AWS S3 & SQS UTILITIES"
Set up the action "GET JOB STATUS FROM SQS" in a recurring workflow ('Do every x seconds'), to poll the job completion status on a regular basis.
Configure this recurring workflow to execute the next step once the job status is SUCCEEDED, using 'Only When' Event Condition, to retrieve the results.
Inputs Fields:
- QUEUE URL: URL of AWS SQS you set up at step 1, used to poll for AWS TEXTRACT job status messages.
- JOBID: ID of the job to poll, returned by "START EXTRACT TEXT JOB (ASYNC)" action.
Output Fields:
- JOB STATUS: Valid values are SUCCEEDED, POLLING, IN_PROGRESS, PARTIAL_SUCCESS and FAILED or ERROR, with error or failure messages being appended to the status.
6) Set up the action "GET EXTRACT TEXT RESULTS (ASYNC)" in the workflow.
Inputs Fields:
- JOBID: ID of the job to poll, returned by "START EXTRACT TEXT JOB (ASYNC)" action.
- MAX RESULTS: Maximum results per paginated calls from AWS. The largest value you can specify is 1000, any greater value will return 1000 results. The default value is 1000. This plugin auto-paginates AWS response based on this parameter.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXTRACT - OCR TEXT & DATA)".
Output Fields:
- RESULTS: Returns a list of Blocks. For each the text, words, lines, confidence value and relationships between the detected items.
5️⃣ : START & GET ANALYZE DOCUMENT (ASYNC)
========================================
📋 ACTION DESCRIPTION
--------------------------------
ANALYZE DOCUMENT in a JPEG, PNG, PDF file stored in AWS S3 to return the structure (forms, tables), text and values (words, lines, selection elements), positions and relationships between the elements.
Asynchronous request mode, useful for large files and time-insensitive application.
🔧 STEP-BY-STEP SETUP
--------------------------------
ℹ️ The steps from 0) to 3) b) can be automatically performed by using this deployment template:
https://console.aws.amazon.com/cloudformation/home?#/stacks/create/review?stackName=BubbleTextract&templateURL=https://bubble-resources.s3.amazonaws.com/deployment-assets/CloudFormation-AWSTextractAsync.yaml You will find the required parameters values used across the plugin in the "OUTPUT" tab of the created stack.
0) Sign-up for AWS TEXTRACT:
https://console.aws.amazon.com/textract/home?p=txt&cp=bn&ad=c 1) Configure AWS TEXTRACT FOR ASYNCHRONOUS OPERATION by following ALL the instructions:
https://docs.aws.amazon.com/textract/latest/dg/api-async-roles.html Write down your:
- ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION
- NOTIFICATION ROLE ARN
- SNS TOPIC ARN
- QUEUE URL
2) In the Plugin Settings, enter the following:
- AWS TEXTRACT ACCESS KEY & ACCESS KEY SECRET
- AWS SERVICE ENDPOINT REGION (if not provided, default endpoint is "us-east-1").
3) Set-up in your workflow an action returning the BUCKET and KEY of your file to analyze.
a) If you do not already have such action, install the plugin "AWS S3 & SQS UTILITIES"
b) Create a AWS S3 BUCKET that will be used to store the file to analyze:
https://docs.aws.amazon.com/AmazonS3/latest/userguide/creating-bucket.html c) Set up the "PUT FILE TO S3" action in the workflow.
Inputs Fields:
- FILE URL TO STORE: The file URL from the Bubble.io uploader, or a Protocol-relative URLs (//server/file.ext), or a HTTPS file URL (
https://server/file.ext). The file must be accessible through the HTTPS protocol.
- AWS S3 BUCKET NAME: AWS S3 Bucket Name to which the file will be saved.
- AWS S3 FILE NAME: Path & Name of the file to put to AWS S3. The format must be [path/]filename.ext.
Example 1: path1/path2/filename.ext.
Example 2: filename.ext if the file is at the root of the bucket.
4) Set up the "START ANALYZE DOCUMENT JOB (ASYNC)" action in the workflow.
Inputs Fields:
- AWS S3 BUCKET NAME: AWS S3 bucket name from which the input file will be read.
- AWS S3 FILE NAME: Path & Name of the JPEG, PNG, PDF file to get from AWS S3. The format must be [path/]filename.ext.
Example 1: path1/path2/filename.ext.
Example 2: filename.ext if the file is at the root of the bucket.
- TABLES ANALYSIS: Set to yes to extract tables and the cells.
- FORMS ANALYSIS: Set to yes to extract forms data.
- QUERIES & RESULT ALIAS: Each Alias - Query pair contains the question you want to ask in the Text. The result of the query will be associated with the Alias you give. Example: InvoiceNo = What is the invoice number? will associate the invoice number to invoiceNo Alias, which is searchable is the response.
- NOTIFICATION ROLE ARN: The ARN of an IAM role giving AWS TEXTRACT publishing permissions to the Amazon SNS topic.
- SNS TOPIC ARN: The AWS SNS topic ARN to which AWS TEXTRACT posts the completion status.
Output Fields:
- JOBID: ID of the Job, to be reused in the "GET JOB STATUS FROM SQS" and "GET ANALYZE DOCUMENT RESULTS (ASYNC)".
5) Install the plugin "AWS S3 & SQS UTILITIES"
Set up the action "GET JOB STATUS FROM SQS" in a recurring workflow ('Do every x seconds') AWS TEXTRACT job status messages with the JOBID.
Configure this recurring workflow to execute the next step once the job status is SUCCEEDED, using 'Only When' Event Condition, to retrieve the results.
Inputs Fields:
- QUEUE URL: URL of AWS SQS you set up at step 1, used to poll for AWS TEXTRACT job status messages.
- JOBID: ID of the job to poll, returned by "START ANALYZE DOCUMENT JOB (ASYNC)" action.
Output Fields:
- JOB STATUS: Valid values are SUCCEEDED, POLLING, IN_PROGRESS, PARTIAL_SUCCESS and FAILED or ERROR, with error or failure messages being appended to the status.
6) Set up the action "GET ANALYZE DOCUMENT RESULTS (ASYNC)" in the workflow.
Inputs Fields:
- JOBID: ID of the job to poll, returned by "START ANALYZE DOCUMENT JOB (ASYNC)" action.
- MAX RESULTS: Maximum results per paginated calls from AWS. The largest value you can specify is 1000, any greater value will return 1000 results. The default value is 1000. This plugin auto-paginates AWS response based on this parameter.
- RESULT DATA TYPE: Returned type, must always be set to "RESULT (TEXTRACT - OCR TEXT & DATA)".
Output Fields:
- RESULTS: Returns a list of Blocks. For each the text, words, lines, tables, forms & cells values, confidence value, and relationships between the detected items.
🔍IMPLEMENTATION EXAMPLE
======================
Feel free to browse the app editor in the Service URL for an implementation example.
ℹ️ ADDITIONAL INFORMATION
======================
> Lines & Words objects details:
https://docs.aws.amazon.com/textract/latest/dg/how-it-works-lines-words.html> Forms objects details:
https://docs.aws.amazon.com/textract/latest/dg/how-it-works-kvp.html> Tables objects details:
https://docs.aws.amazon.com/textract/latest/dg/how-it-works-tables.html> Selection Elements objects details:
https://docs.aws.amazon.com/textract/latest/dg/how-it-works-selectables.html> AWS TEXTRACT service limits:
https://docs.aws.amazon.com/textract/latest/dg/limits.html> AWS services availability per region:
https://aws.amazon.com/about-aws/global-infrastructure/regional-product-services/> AWS Service endpoints list:
https://docs.aws.amazon.com/general/latest/gr/rande.html⚠️TROUBLESHOOTING
================
Any plugin related error will be posted to the the Logs tab, "Server logs" section of your App Editor.
Make sure "Plugin server side logging" and "Plugin client side logging" is selected in "Show Advanced".
For front-end actions, you can also open your browser's developer console (F12 or Ctrl+Shift+I in most browsers) to view detailed error messages and logs.
Always check the ERROR MESSAGE state of the element and implement error handling using the ERROR event to provide a better user experience.
> Server Logs Details:
https://manual.bubble.io/core-resources/bubbles-interface/logs-tab#server-logs⚡PERFORMANCE CONSIDERATIONS
===========================
GENERAL
-------------
For back-end actions, the maximum processing duration is capped at 30 seconds as per Bubble.io design. This time limitation does not apply to front-end actions.
⏱️ BACK-END ACTION START DELAY
-----------------------------------------------
Each time a server-side action is called, Bubble initializes a small virtual machine to execute the action. If the same action is called shortly after, the caching mechanism kicks in, resulting in faster execution on subsequent calls.
A useful workaround is to fire a dummy execution at page load, which pre-warms the Bubble engine for the next few minutes, reducing the impact of cold starts for your users.
FRONT-END VS BACK-END PROCESSING
----------------------------------------------------
The front-end element is designed to support and optimize multiple image formats and will automatically compress images to adhere to AWS requirements. The back-end action doesn't perform this optimization, so be careful with file size and format when using it.
❓QUESTIONS?
===========
Contact us at
[email protected] for any additional feature you would require or support question.