Hi everyone,
I’m working with procurement documents (PDFs) that contain hyperlinks (e.g. links to official tender platforms). When I use the Get PDF Text smart service (from PDF Tools), the visible text comes through, but the embedded hyperlinks (annotations/URI actions) are not returned.
I also tested with AI Skills (Document Extraction), but it seems to behave the same way — as if Appian is only sending the extracted text to the AI model rather than the full PDF object. As a result, the links are lost there as well.
Has anyone found a way in Appian to extract these hidden links directly from PDFs?
Is there any existing plug-in or connected system that supports reading hyperlink annotations?
Or would the only option be to build a custom plug-in with something like Apache PDFBox?
Thanks in advance for your insights!
Discussion posts and replies are publicly visible
I think the functions you tried only work with the human visible text, which makes sense for many/most use cases. Yours is very specific and you probably have to use some external service or a custom plugin.
That’s what I was suspecting as well. Thanks for confirming it! I’ve started working on a custom plug-in to handle PDF annotations/URI actions directly (since the standard functions only return visible text), but I wanted to double-check with the community before going too far.
It’s a bit of a pity that the document itself isn’t passed to the AI Skills — if it were, extracting the hyperlinks would be immediate (I’ve tried with ChatGPT and it returns them right away).