Hello all,We have a requirement to extract the data from the Driving License & Passport images. The catch here is that the user might have clicked it from his phone and sent it to us so We are converting them into PDF but we are not able to extract data from it as it is a semi or unstructured document. Please help if any kind of extraction is possible in such documents.Thanks in advance.
Discussion posts and replies are publicly visible
What is the issue that you are facing??Did you train the forms or in this case the images?
We are trying data extraction from pdfs but due to the different formats of the docs it is not able to extract the data consistently and properly & also it gives different results when testing the same document again and again
I am not able to understand what you are referring here as different formats. Are you referring to different document versions?Fields are the same in all documents right? If yes, try training the document for like max 10 times. If still not able to identify, then probably you need to check with appian
There are different format of licence and passports all over the world with different field names and sometimes there are no fields. We have a broader level of use case with us.
Ok. Then its little difficult to extract the info since they are multiple version of the documents.You probably need to check with appian for this use case.
You can try integrating to third party like AWS Textract Analyze Id ,this model is trained to extract entities from ID docs with better accuracy .Click here for reference utility, here appian OTB extraction may have less accuracy as they are images converted to docs.