Hi Everyone,
I'm using Appian Document Extraction and need to process multiple invoice documents uploaded in a claim.
What is the recommended approach to:
Pass multiple invoices for extraction
Extract data from each invoice
Store the results in a List of Record Type/CDT
Can a list of document IDs be passed directly to Document Extraction, or should each document be processed individually (using a loop/subprocess)?
Discussion posts and replies are publicly visible
As you will probably want to handle extraction issues per document, I would recommend to implement a small sub process to manage extraction and issue handling for a single document. Then, start this process for each document.
You can create a process that can extract data from document, reconcile the data and write to records for one document. Then call this as an asynchronous start process node for each of the invoices using MNI configurations. You can follow this tutorial to build the process for this feature.
Extracting data from a single document takes around 5 minutes. Since I need to process a list of invoices, using a loop causes the total processing time to increase significantly.
Are there any best practices to reduce extraction time or process multiple documents more efficiently? Is parallel processing possible, or is there a better approach than looping through each document one by one?
I'd recommend going with AI Document Center here. It's built for exactly this centralized intake, parallel extraction, reconciliation, and model versioning all out of the box.For your case, just ingest the invoices with claim ID as metadata, let Doc Center extract in parallel, and query the results back as records. No MNI loop needed, unlike the AI Skill which runs per document. I'd only fall back to the AI Skill if extraction has to happen inline inside a specific process flow.https://www.youtube.com/watch?v=hbpFtbxl1pA&t=1shttps://docs.appian.com/suite/help/26.5/aidc-3.1/aidc-landing.html
If you call the extraction process for each document it will do parallel processing of the invoices. So don't loop rather use MNI on start process node so documents are processed asynchronously and parallelly as multiple instances will run for each document respetively.