How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

ravitejap370818

Certified Senior Developer

1 month ago

Has anyone worked with Microsoft Publisher (.PUB) files in Appian?

Our requirement is to allow users to upload .pub files and extract specific data from them for further processing. AI Capabilities are not enabled.

Is there any Appian-native approach, plugin or integration that supports reading/extracting content from .pub files? or is conversion to another format (ex PDF) generally required before processing.

Discussion posts and replies are publicly visible

Top Replies

0 Stefan Helzle
Certified Lead Developer
1 month ago

I never heard of any such use case. If you can turn the files into PDF first, you can do the extraction with OOTB functions.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Reject Answer

Cancel
0 mehwishy973927 1 month ago
Since Appian doesn't natively understand .PUB files, you can:

Upload .PUB file to Appian.

Send it to a custom API (Java, .NET, Python).

API extracts text/content.

Return JSON to Appian.

Example response:

{ "customerName": "John Smith", "invoiceNumber": "INV-1234", "amount": "$500" }

Appian then processes the JSON.

OR try structured text rather than OCR:

PUB → DOCX → Extract Text → Appian

or

PUB → HTML → Parse Content → Appian

This can preserve more structure than a PDF in some cases.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ravitejap370818
Certified Senior Developer
1 month ago in reply to mehwishy973927

Thanks for the suggestions.

We explored the Python-based approach and were able to extract the data; however, it requires hosting, so we are evaluating alternatives.

From what I understand DOCX/HTML approach would still require a conversion step, as Appian does not natively handle PUB files.

We are currently exploring Appian-based options (such as plugins) to handle the extraction as much as possible within the platform.
Cancel
Vote Up +1 Vote Down

Sign in to reply

Verify Answer

Cancel