PDF Read OCR

bihitakdass

A Score Level 1

over 2 years ago

Hi All,

I have requirement to read content of PDF having Table and Paragraphs which is structured and unstructured and store it into TABLES without any manual intervention.

I have used the Appian inbuilt feature doc extract but its not providing me accurate reading capability like some fields are readable but some are not.

https://docs.appian.com/suite/help/23.2/evaluate-doc-extraction.html

We would like to stick with Appian as it keeps the doc in Appian rather than using Google

Any suggestion will be helpful

Thanks

Discussion posts and replies are publicly visible

Parents

0 Stefan Helzle
Certified Lead Developer
over 2 years ago

How many training cycles did you do?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bihitakdass
A Score Level 1
over 2 years ago in reply to Stefan Helzle

What is meant by training cycle can you please help to explain , as of now I have passed the same PDF 10 time through the work flow and its unbale to read the fields
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to bihitakdass

In my blog post I describe how training works: appian.rocks/.../
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bihitakdass
A Score Level 1
over 2 years ago in reply to Stefan Helzle

Hi Stefan,
I cannot see the option , is it some licence issue

In the Build view, click NEW > AI Skill.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to bihitakdass

This needs to be enabled by Appian.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bihitakdass
A Score Level 1
over 2 years ago in reply to Stefan Helzle

Currently we are on 23.1, I guess this feature is available on next version 23.2
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to bihitakdass

Yes
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 bihitakdass
A Score Level 1
over 2 years ago in reply to Stefan Helzle

Hi Stefan,

We have upgraded to 23.2 and have provided the training documents in the AI Training skills but still in the extraction form document "ExtractedData" nothing appears in the output , and when it reaches to the reconcile and on View form only 30% of the document is scanned and half the tables are missed.

What I have seen is that for 1st time we have do to a manual intervention in Reconcile Task to map the non readable fields manually then after that if the same set of doc is passed then it is able to read few key value pairs but still struggling with reading tables.

Our business requirement was that customer will send us PDF via email to process model and it will read the PDF having tables and non tables and store in DB without any manual intervention from customer.

I guess this new AI Training is for simple documents like invoice which has straight FW key value pairs.

Our document is a complex report having a complex piece of table with header names with special characters like

Is there any other way to read PDF
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to bihitakdass

As I already tried to explain, the training for extract ONLY happens in the reconciliation step. You will need to repeat that at least a few times with various documents, so that the machine learning model understands what you want it to do.

Then, you add a data validation step after the extraction to decide whether the extraction was good or needs a manual reconciliation.

And if you have the feeling that the OOTB extraction is not powerful enough, I suggest to contact Appian and discuss your specific use case.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 Stefan Helzle
Certified Lead Developer
over 2 years ago in reply to bihitakdass

As I already tried to explain, the training for extract ONLY happens in the reconciliation step. You will need to repeat that at least a few times with various documents, so that the machine learning model understands what you want it to do.

Then, you add a data validation step after the extraction to decide whether the extraction was good or needs a manual reconciliation.

And if you have the feeling that the OOTB extraction is not powerful enough, I suggest to contact Appian and discuss your specific use case.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data