PDF Read OCR

Hi All,

I have requirement to read content of PDF having Table and Paragraphs which is structured and unstructured and store it into TABLES without any manual intervention.

I have used the Appian inbuilt feature doc extract but its not providing me accurate reading capability like some fields are readable but some are not. 

https://docs.appian.com/suite/help/23.2/evaluate-doc-extraction.html

We would like to stick with Appian as it keeps the doc in Appian rather than using Google 

Any suggestion will be helpful

Thanks

  Discussion posts and replies are publicly visible

Parents Reply Children
  • Hi Stefan,

    We have upgraded to 23.2 and have provided the training documents in the AI Training skills but still in the extraction form document "ExtractedData" nothing appears in the output , and when it reaches to the reconcile and on View form only 30% of the document is scanned and half the tables are missed.

    What I have seen is that for 1st time we have do to a manual intervention in Reconcile Task to map the non readable fields manually then after that if the same set of doc is passed then it is able to read few key value pairs but still struggling with reading tables.

    Our business requirement was that customer will send us PDF via email to process model and it will read the PDF having tables and non tables and store in DB without any manual intervention  from customer.

    I guess this new AI Training is for simple documents like invoice which has straight FW key value pairs.

    Our document is a complex report having a complex piece of table with header names with special characters like 

    Is there any other way to read PDF

  • 0
    Certified Lead Developer
    in reply to bihitakdass

    As I already tried to explain, the training for extract ONLY happens in the reconciliation step. You will need to repeat that at least a few times with various documents, so that the machine learning model understands what you want it to do.

    Then, you add a data validation step after the extraction to decide whether the extraction was good or needs a manual reconciliation.

    And if you have the feeling that the OOTB extraction is not powerful enough, I suggest to contact Appian and discuss your specific use case.