Hello,
Recently I was provided with a requirement to extract certain data from PDF documents. The PDF documents should be extracted from a common mailbox and then few data available within the documents will have to be extracted automatically and stored in a database table. The whole process should be automated and there should not be any manual activity present within this process apart from reconciliations if any.
After going through the documentation, I came across the inbuilt application IDP and the basic Document Extraction smart services which can perform the data extraction task. As IDP requires some manual activity, I prefer document extraction services as a better option for this requirement.
Can use Appian RPA to automate the steps required in using IDP to achieve this functionality ? Is this a better option than using the smart services ? Please suggest the best way of achieving this requirement.
Discussion posts and replies are publicly visible
Hello Deepak,to simplify this a bit:RPA is meant to execute tasks in the system like you configured it in advance, not to develop or customize features or functionality.RPA, IDP or a plain basic appian process needs to be set up at first.-> so would not recommend to use RPA to train or customize the IDP set up.And it is not quite true that IDP requires a manual action every time.You have to train the IDP AI in the first couple of dozens/hundreds of documents, but after that, if the document structures stays basically the same, the rate of recognition is increasing and no further manual steps should ne required if the process is set up properly.Kind regards,Richard
Thanks for the response Richard.
As far as I understand, for IDP to extract details from a document, they will have to be manually uploaded to the system. I meant automating these steps using Appian RPA, so that the document will get uploaded to system for data extraction.
I assume the core logic behind IDP and document extraction smart services is the same. Please correct me if I am wrong.
Apart from this, there are other activities which needs to be performed post data extraction, like extract certain pdf pages, merge pdf, email pdf, etc.
If the core extraction logic behind IDP and document extraction smart services are same and the above activities can be performed directly from the process level, would it be wise to stick with document extraction smart services instead of using IDP ?
The functionality is NOT the same. Document extraction does just read the text content from a PDF. IDP does OCR, classification and field-by-field extraction.
The idea of IDP is to embed it into your own process. Check the documentation on how to do that.
Hello Deepak,i will not exclude this approach, as i am not absolutly aware of how RPA can deal with documents. but in total, i don't see a reason there doing it? it is not the same, but the smart service you discovered is part of the IDP functionality. ;) you cannot use this function properly without customizing IDP properly. quote: "The Appian Document Extraction page walks you through how to use document extraction functionality together in a process model. "docs.appian.com/.../Start_Doc_Extraction_Smart_Service.html"">docs.appian.com/.../Appian_Doc_Extraction.html"The additional actions can be done pretty simple in the system iteself. there are plugins for that.But i would focus on IDP at first an then thinking about RPA etc in the next step.One by one. Perhaps RPA is not the way to automate it..
Hi stefan, perhaps i missed the plugin, but are you talking about that:https://docs.appian.com/suite/help/22.1/Start_Doc_Extraction_Smart_Service.html?
Yeah, good question. I meant this one
https://community.appian.com/b/appmarket/posts/pdf-tools
I was able to perform the other activities using existing smart services like pdf tools, document management functionality and send email service.
My concern is whether to use IDP or Document Extraction smart service for data extraction. All I need to take is 6 fields from the PDF.
I was able to perform the extraction functionality using Start Doc Extraction smart service. Is IDP much more efficient and an ideal way to implement the functionality than appian smart service ?
IDP is a whole solution, while the smart service is only a smart service. And that smart service is meant to be used on IDP only. I do not recommend to use it isolated.
its not only about efficiency regarding implementation. Whats about long term product strategie? Maintenance costs for the live application etc?IDP is "the big solution" maintained and supported by Appian. There is an AI included etc. The plugins "just" provide java based logic.Plugins are essentially third party functionality extensions which have to be maintained by the provider/creator.So basically its your decision. It has all pro and contra arguments. Personally I usually prefer appian functionality whenever possible, as appian has in its control how it fits in their overall strategy.
i think we should at first clarify:What do you mean by " Document Extraction smart service"?
I used the below smart service.
stefan is absolutly right. This IS the function (aka smart service) to call the IDP functionality. Smart services are nothing else than a type of node which provides often a! funtions to call this functionality outside of a process context.Quote: " Smart services are flow activities that integrate specialized business services, like sending e-mails or writing data to a database" - or calling IDP functions.https://docs.appian.com/suite/help/22.1/Smart_Services.html
So, can I conclude that IDP should be used as a sub-process instead of Doc Extraction smart service as a best practice ?
https://docs.appian.com/suite/help/22.1/idp-1.3/using-idp-in-a-subprocess.html
"know the rules break the rules" - but lets say yes. i see no need to call the a!function as a process gives you the chance to execute the content extraction asynchronously