Has anyone worked with Microsoft Publisher (.PUB) files in Appian?
Our requirement is to allow users to upload .pub files and extract specific data from them for further processing. AI Capabilities are not enabled.Is there any Appian-native approach, plugin or integration that supports reading/extracting content from .pub files? or is conversion to another format (ex PDF) generally required before processing.
Discussion posts and replies are publicly visible
I never heard of any such use case. If you can turn the files into PDF first, you can do the extraction with OOTB functions.
Since Appian doesn't natively understand .PUB files, you can:
.PUB
Example response:
{ "customerName": "John Smith", "invoiceNumber": "INV-1234", "amount": "$500" }
Appian then processes the JSON.
OR try structured text rather than OCR:
PUB → DOCX → Extract Text → Appian
or
PUB → HTML → Parse Content → Appian
This can preserve more structure than a PDF in some cases.
Thanks for the suggestions. We explored the Python-based approach and were able to extract the data; however, it requires hosting, so we are evaluating alternatives.
We are currently exploring Appian-based options (such as plugins) to handle the extraction as much as possible within the platform.