Google OCR retrieve the JSON result file

Question

Hi, I was trying the connected system ( Google Cloud Document OCR Connected System ) and its services. I am able to perform the OCR operation on the PDF and get the JSON file. Now if I try to use the Format OCR Results service and retrieve the data out of the JSON to Appian it gives me only the result for my first page and not the other pages. Have anyone faced the same issue?

Ben Edwards · Accepted Answer

Hi there, my team recently ran into the same issue. I informed Appian support through a support case and was told that the engineering team is investigating an update. 
 We developed a pretty decent workaround that involves more work. We add a pdf to our Google Cloud Storage bucket. Then, we use an integration that performs the 'Start OCR' operation via the Google OCR connected system that Appian provides. After that, we use the 'Get Signed URL' an integration that performs the 'Get Signed URL' operation via the Google Storage connected system that Appian provides to get the results file that gets created in the Storage bucket from the 'Start OCR' operation. Using that URL, we perform a GET request to get the contents of that results file and parse the file, which includes more than just the first page results. It's more work than just using the 'Format OCR Results' operation, but it's working for a solution we've developed. The key is that the 'Start OCR' operation creates a result file in your Storage bucket and you should be able to retrieve it.

Google OCR retrieve the JSON result file

Top Replies