DOCX to HTML

Overview

The DOCX to HTML smart service is the perfect solution for converting DOCX documents into HTML files. Using this smart service will allow either creating a new HTML file or updating an existing file. It can also add the font file that has been used in the DOCX document to be converted.

Notes

  • Please note that the template may need to be tweaked or simplified a little to achieve the desired results of translation from DOCX to HTML.
  • Supported Language - English (United States).

Limitations

  • This smart service supports DOCX documents created using MS Word and WPS Office only.

Key Features & Functionality

Parameters

Inputs

  • Source Document (document): The source doc or docx document to be converted.
  • Create New Document (boolean): Provide True if a new output html document is to be created.
  • New Document Name (text): Name of the output HTML document.
  • New Document Desc (text): Description of the output HTML document.
  • Save In Folder (folder): The folder in which the output HTML document to be saved..
  • Font Documents (document): Provide the font document that is used in the source Doc or Docx document.
  • Existing Document (document): The existing document to be overwritten with the new HTML document. Mandatory if ‘Create New Document’ is set as False.

Output

  • New Document Created (document): The generated HTML document.

Note:


The smart service requires the Apache Xalan and Xalan Serializer libraries as its dependencies, but the OWASP dependency check reports vulnerability for both the JARs. Also we checked the Sonatype OSS Index and there aren't any vulnerabilities listed for those versions.Since these libraries are required for the smart service to work properly we have included these JARS.


Reference Links:
https://ossindex.sonatype.org/component/pkg:maven/xalan/serializer@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

https://ossindex.sonatype.org/component/pkg:maven/xalan/xalan@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

Anonymous
  • Hi Santosh,

    thanks for your response. Attaching word doc + fonts which I assume are correct (but as I mentioned in first message, in smart service we are using whole folder of fonts which are working fine in 'Configure PDF from DOCX (with Fonts)'.

    times_fonts.ziptemplate (html_test2).docx

  • Hi ,

    Thanks for reaching out. We understand your issue. If it is possible please share the Sample DOCX file with the font file. It helps us in replicating the issue. Thanks in advance.

  • Hi,

    I was trying this plugin recently but always end up with either empty html file or error 'Exception exporting package'.

    Error log (attached) suggests issue with fonts. I am using the same font documents (.ttf) like in 'Configure PDF from DOCX (with Fonts)' plugin, which is working fine. 

    Can you please suggest what I am doing wrong?

    Thanks

    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006274.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006275.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006276.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006279.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006283.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006296.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006300.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006322.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006352.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006386.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006388.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006409.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006419.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006437.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006438.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006439.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006444.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006448.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006452.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006453.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006493.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006534.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006548.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006552.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006581.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006582.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006596.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: package read;  elapsed time: 4 ms
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1610279883964
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: 
    
     Populating font mappings.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: Consolas already mapped.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: .. but checking again, since physical fonts have changed.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: Calibri already mapped.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: .. but checking again, since physical fonts have changed.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsExtendedPart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsCorePart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1610279883982
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.preprocess.FieldsCombiner.process()
    INFO: starting
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.ListsToContentControls.process()
    INFO: No NumberingDefinitionsPart, skipping
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.XmlUtils.transform()
    INFO: Using org.apache.xalan.transformer.TransformerImpl
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.XsltCommonFunctions.logInfo()
    INFO: /pkg:package
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.HtmlCssHelper.createCssForStyles()
    WARNING: ! null rPr for character style DefaultParagraphFont