How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

Certified Senior Developer

Has anyone worked with Microsoft Publisher (.PUB) files in Appian?


Our requirement is to allow users to upload .pub files and extract specific data from them for further processing. AI Capabilities are not enabled.

Is there any Appian-native approach, plugin or integration that supports reading/extracting content from .pub files? or is conversion to another format (ex PDF) generally required before processing.


  Discussion posts and replies are publicly visible

Parents
  • Since Appian doesn't natively understand .PUB files, you can:

    1. Upload .PUB file to Appian.
    2. Send it to a custom API (Java, .NET, Python).
    3. API extracts text/content.
    4. Return JSON to Appian.

    Example response:

    {
      "customerName": "John Smith",
      "invoiceNumber": "INV-1234",
      "amount": "$500"
    }

    Appian then processes the JSON.


    OR try  structured text rather than OCR:

    PUB → DOCX → Extract Text → Appian

    or

    PUB → HTML → Parse Content → Appian

    This can preserve more structure than a PDF in some cases.

Reply
  • Since Appian doesn't natively understand .PUB files, you can:

    1. Upload .PUB file to Appian.
    2. Send it to a custom API (Java, .NET, Python).
    3. API extracts text/content.
    4. Return JSON to Appian.

    Example response:

    {
      "customerName": "John Smith",
      "invoiceNumber": "INV-1234",
      "amount": "$500"
    }

    Appian then processes the JSON.


    OR try  structured text rather than OCR:

    PUB → DOCX → Extract Text → Appian

    or

    PUB → HTML → Parse Content → Appian

    This can preserve more structure than a PDF in some cases.

Children
  • 0
    Certified Senior Developer
    in reply to mehwishy973927

    Thanks for the suggestions.

    We explored the Python-based approach and were able to extract the data; however, it requires hosting, so we are evaluating alternatives.

    From what I understand DOCX/HTML approach would still require a conversion step, as Appian does not natively handle PUB files.

    We are currently exploring Appian-based options (such as plugins) to handle the extraction as much as possible within the platform.