IDP to extract line items from a PDF document that spans across two pages?

Certified Senior Developer

I Have line items grid in the pdf document that continues from first page to second, but IDP is extracting only from first page, the second page line items are not getting extracted.

What should be done in order to get this right?

  Discussion posts and replies are publicly visible

  • 0
    Certified Associate Developer
    1. Configure IDP to handle multi-page documents: By default, IDP may be set to process only the first page of a document. You can adjust the configuration to handle multi-page documents. Check the IDP configuration settings or contact your system administrator to ensure that IDP is configured to process all pages of the PDF.

    2. Check the line item extraction logic: Review the extraction logic or rules you've defined for extracting line items from the PDF. Ensure that the rules are correctly set up to capture line items from both the first and subsequent pages. You may need to adjust the rule parameters or create separate rules to handle line items on different pages.

    3. Validate the PDF structure: Verify that the PDF document's structure is consistent across pages. Sometimes, the layout or structure of a PDF can vary from page to page, causing extraction issues. If this is the case, you may need to adjust your extraction rules to handle the varying layouts or consider pre-processing the PDF to ensure a consistent structure.

    4. Split the multi-page PDF into individual pages: If the above steps do not resolve the issue, you can try splitting the multi-page PDF into individual pages. Extract the line items separately from each page using IDP. Once you have the extracted line items from each page, you can consolidate them in your application or process.