How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

RE: How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

ravitejap370818 — Tue, 09 Jun 2026 06:17:08 GMT

Thanks for the suggestions.

We explored the Python-based approach and were able to extract the data; however, it requires hosting, so we are evaluating alternatives.

From what I understand DOCX/HTML approach would still require a conversion step, as Appian does not natively handle PUB files.

We are currently exploring Appian-based options (such as plugins) to handle the extraction as much as possible within the platform.

RE: How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

mehwishy973927 — Tue, 09 Jun 2026 03:09:14 GMT

Since Appian doesn't natively understand .PUB files, you can:

Upload .PUB file to Appian.
Send it to a custom API (Java, .NET, Python).
API extracts text/content.
Return JSON to Appian.

Example response:

{
  "customerName": "John Smith",
  "invoiceNumber": "INV-1234",
  "amount": "$500"
}

Appian then processes the JSON.

OR try structured text rather than OCR:

PUB → DOCX → Extract Text → Appian

PUB → HTML → Parse Content → Appian

This can preserve more structure than a PDF in some cases.

RE: How To Extract Data from Microsoft Publisher (.PUB) Files in Appian?

Stefan Helzle — Mon, 08 Jun 2026 06:06:24 GMT

I never heard of any such use case. If you can turn the files into PDF first, you can do the extraction with OOTB functions.