Extracting Text from a pdf paragraph wise

Question

Hi,
I want to extract text from a pdf page by page, paragraph wise, which includes different headings and related contents.
Then I want to store them in DB which includes two columns one is heading column and other is content column.
I used getpdftext function initially, then extract function to extract the desire contents, but did not get the expected result as the heading words are also present inside contents.
Anybody please help me out?

Gabriele Camilli · Answer

Hi, This method is very specific on the document format First thing is to ask, can you use IDP? probably easier than working using the PDF plugin 
 Second thing, if it's all the heading that appears, you can just remove the heading with substitute(local!text, localHeading, "") 
 Third, if formers for some reason are not applicable, i think i will need to see a sample of the text that the getpdftext returns, it's that possible?