I have a requirement to convert doc, docx to pdf. I found we can do it using PDF from Single Docx (with fonts) smart service.
This smart service works well for converting docx to odf but it run for .doc files too with inconsistent behavior.
For some .doc document this service works fine but for other it returns an error "org.docx4j.openpackaging.exceptions.Docx4JException: This file seems to be a binary doc/ppt/xls, not an encrypted OLE2 file containing a doc/pptx/xlsx"
Does anyone else encountered the same?
Discussion posts and replies are publicly visible
I'd suggest (though i know it won't be much help) that ".doc" is by now an extremely outdated legacy format, being microsoft's old / locked / proprietary "Word" format before they switched to the open XML standard we see in ".docx" files. Where this becomes important is in the fact that if you're dealing with many such legacy files, there might be different flavors / different binary structures / different encodings of various sorts that no single plug-in will be properly prepared to handle all of. Can you share a bit more about the source of these files? Are there a standard set or are you trying to make it handle anything users upload in the wild? Is there any chance these could be passed through Word to re-save under the newer format, first?
This documents need to come from external systems via integration. So we do not have any control on them and hence cannot be passed through word to re-save for the new format.
Ultimately you might be stuck simply strictly enforcing what types of documents you can accept.