DOCX to HTML

Overview

The DOCX to HTML smart service is the perfect solution for converting DOCX documents into HTML files. Using this smart service will allow either creating a new HTML file or updating an existing file. It can also add the font file that has been used in the DOCX document to be converted.

Notes

  • Please note that the template may need to be tweaked or simplified a little to achieve the desired results of translation from DOCX to HTML.
  • Supported Language - English (United States).

Limitations

  • This smart service supports DOCX documents created using MS Word and WPS Office only.

Key Features & Functionality

Parameters

Inputs

  • Source Document (document): The source doc or docx document to be converted.
  • Create New Document (boolean): Provide True if a new output html document is to be created.
  • New Document Name (text): Name of the output HTML document.
  • New Document Desc (text): Description of the output HTML document.
  • Save In Folder (folder): The folder in which the output HTML document to be saved..
  • Font Documents (document): Provide the font document that is used in the source Doc or Docx document.
  • Existing Document (document): The existing document to be overwritten with the new HTML document. Mandatory if ‘Create New Document’ is set as False.

Output

  • New Document Created (document): The generated HTML document.

Note:


The smart service requires the Apache Xalan and Xalan Serializer libraries as its dependencies, but the OWASP dependency check reports vulnerability for both the JARs. Also we checked the Sonatype OSS Index and there aren't any vulnerabilities listed for those versions.Since these libraries are required for the smart service to work properly we have included these JARS.


Reference Links:
https://ossindex.sonatype.org/component/pkg:maven/xalan/serializer@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

https://ossindex.sonatype.org/component/pkg:maven/xalan/xalan@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

Anonymous
Parents
  • Hi,

    I was trying this plugin recently but always end up with either empty html file or error 'Exception exporting package'.

    Error log (attached) suggests issue with fonts. I am using the same font documents (.ttf) like in 'Configure PDF from DOCX (with Fonts)' plugin, which is working fine. 

    Can you please suggest what I am doing wrong?

    Thanks

    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006274.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006275.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006276.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006279.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006283.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006296.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006300.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006322.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006352.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006386.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006388.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006409.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006419.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006437.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006438.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006439.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006444.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006448.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006452.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006453.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006493.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006534.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006548.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006552.fon (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006581.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006582.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.PhysicalFonts.getPhysicalFont()
    WARNING: Aborting: file:/usr/local/appian/ae/_admin/accdocs2/1810/10006596.ttf (can't get EmbedFontInfo[] .. try deleting fop-fonts.cache?)
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: package read;  elapsed time: 4 ms
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1610279883964
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: 
    
     Populating font mappings.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: Consolas already mapped.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: .. but checking again, since physical fonts have changed.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: Calibri already mapped.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: .. but checking again, since physical fonts have changed.
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsExtendedPart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsCorePart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1610279883982
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.preprocess.FieldsCombiner.process()
    INFO: starting
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.ListsToContentControls.process()
    INFO: No NumberingDefinitionsPart, skipping
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.XmlUtils.transform()
    INFO: Using org.apache.xalan.transformer.TransformerImpl
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.XsltCommonFunctions.logInfo()
    INFO: /pkg:package
    2021-01-10 11:58:03 [Appian Work Item - 656 - execution01 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.HtmlCssHelper.createCssForStyles()
    WARNING: ! null rPr for character style DefaultParagraphFont

  • Hi ,

    Thanks for reaching out. We understand your issue. If it is possible please share the Sample DOCX file with the font file. It helps us in replicating the issue. Thanks in advance.

  • Hi this screenshot shows state of process model variables. Please check the output tab of smart service node and store the error occured error message and newdocument created there ,click on process properties -> variables and share us the screenshot of the page

  • Hi @Raghu,

    Thanks for your response. 

    I'm using Appian cloud version 21.3.

    I'm not getting any error and any values in variables.

    Please find the screenshots below.

    Please let me know if you need any other info from my end.

  • Hi @Mushahid,

    Attaching the converted file on execution from our end .We are able to execute successfully in our end. Please share with us the value of output  variables errormessage and errorstatus after execution. Also let us know the the Appian version and whether you are using cloud /self managed(on premise) version of appian

    test.zip

  • docToHtmlScreenshot.zip

    Thanks Raghu for response,

    Please find the screenshots of configuration.

  • hi Mushaid

    Please let us know the configurations you have made on the node and sample docx file to reproduce the issue.We will check and share an update on the same

  • I have followed all instruction however I'm not able to generate new html  file and not getting any errors :( 

  • Hi ,

    Now we can reproduce the issue. We will fix the issue and update it to the App Market asap.
    Anyway thanks for notifying the issue.

  • Hi Santosh,

    I dont know how it is possible but I always end up with the same error. Can you please check attached screenshots from smart service configuration if you see some issues there?docxToHtml.zip

  • Hi ,

    We tested the sample files you provided. I believe that you provide the font folder as an input to the font in the smart service. Please try providing the font as an array of documents on the input of the smart service. While we tested an array of fonts with the DOCX you provided as input, the output is generated as expected.

  • Hi Santosh,

    thanks for your response. Attaching word doc + fonts which I assume are correct (but as I mentioned in first message, in smart service we are using whole folder of fonts which are working fine in 'Configure PDF from DOCX (with Fonts)'.

    times_fonts.ziptemplate (html_test2).docx

Comment Children