DOCX to HTML

Overview

The DOCX to HTML smart service is the perfect solution for converting DOCX documents into HTML files. Using this smart service will allow either creating a new HTML file or updating an existing file. It can also add the font file that has been used in the DOCX document to be converted.

Notes

  • Please note that the template may need to be tweaked or simplified a little to achieve the desired results of translation from DOCX to HTML.
  • Supported Language - English (United States).

Limitations

  • This smart service supports DOCX documents created using MS Word and WPS Office only.

Key Features & Functionality

Parameters

Inputs

  • Source Document (document): The source doc or docx document to be converted.
  • Create New Document (boolean): Provide True if a new output html document is to be created.
  • New Document Name (text): Name of the output HTML document.
  • New Document Desc (text): Description of the output HTML document.
  • Save In Folder (folder): The folder in which the output HTML document to be saved..
  • Font Documents (document): Provide the font document that is used in the source Doc or Docx document.
  • Existing Document (document): The existing document to be overwritten with the new HTML document. Mandatory if ‘Create New Document’ is set as False.

Output

  • New Document Created (document): The generated HTML document.

Note:


The smart service requires the Apache Xalan and Xalan Serializer libraries as its dependencies, but the OWASP dependency check reports vulnerability for both the JARs. Also we checked the Sonatype OSS Index and there aren't any vulnerabilities listed for those versions.Since these libraries are required for the smart service to work properly we have included these JARS.


Reference Links:
https://ossindex.sonatype.org/component/pkg:maven/xalan/serializer@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

https://ossindex.sonatype.org/component/pkg:maven/xalan/xalan@2.7.2?utm_source=dependency-check&utm_medium=integration&utm_content=8.2.1

Anonymous
  • Attached is the Word document I am trying to convert to HTML.

    Simple Word Document.docx

    We are also using Appian Cloud 22.1

  • I am falling into the same category.  Even the most basic of Word documents is producing an Exception exporting package error with the following warning in the log file:

    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.contenttype.ContentTypeManager.createPackage()
    INFO: Detected WordProcessingML package 
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: Instantiated package of type org.docx4j.openpackaging.packages.WordprocessingMLPackage
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.io3.Load3.get()
    INFO: package read;  elapsed time: 4 ms
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.getDefaultFont()
    INFO: rPrDefault/rFonts referenced Arial
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1655247414308
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: 
    
     Populating font mappings.
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    ERROR: Font Courier Newnot found in font table!
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    ERROR: Font Symbolnot found in font table!
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: Arial already mapped.
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    INFO: .. but checking again, since physical fonts have changed.
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    ERROR: Font Wingdingsnot found in font table!
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    ERROR: Font Avenirnot found in font table!
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.BestMatchingMapper.populateFontMappings()
    ERROR: Font Wingdings 2not found in font table!
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsExtendedPart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsExtendedPart
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.DocPropsCorePart.unmarshal()
    INFO: unmarshalling org.docx4j.openpackaging.parts.DocPropsCorePart
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.getDefaultFont()
    INFO: rPrDefault/rFonts referenced Arial
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.FontTablePart.processEmbeddings()
    INFO: Writing temp embedded fonts 1655247414333
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.preprocess.FieldsCombiner.process()
    INFO: starting
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.getDefaultFont()
    INFO: rPrDefault/rFonts referenced Arial
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 4 based on w:numStyleLink METableBullets
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 9 based on w:numStyleLink MENoIndent
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 11 based on w:numStyleLink MEBasic
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 13 based on w:numStyleLink MENumber
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 19 based on w:numStyleLink MELegal
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 21 based on w:numStyleLink MENumber
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.WordprocessingML.NumberingDefinitionsPart.resolveLinkedAbstractNum()
    INFO: Updated abstract list def 23 based on w:numStyleLink Legal
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.XmlUtils.transform()
    INFO: Using org.apache.xalan.transformer.TransformerImpl
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.convert.out.common.XsltCommonFunctions.logInfo()
    INFO: /pkg:package
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.XsltHTMLFunctions.setSpanAttr()
    WARNING: Can't set @class; No style node for: DefaultParagraphFont
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.openpackaging.parts.ThemePart.getFontFromTheme()
    INFO: Empty typeface in font for MINOR_EAST_ASIA
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.fonts.RunFontSelector.fontSelector()
    INFO: theme font for lang org.docx4j.wml.CTLanguage@15e5517a is null, but we don't have that
    2022-06-14 22:56:54 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] org.docx4j.convert.out.html.XsltHTMLFunctions.setSpanAttr()
    WARNING: Can't set @class; No style node for: DefaultParagraphFont
    2022-06-14 22:56:54,410 [Appian Work Item - 1853735 - execution02 : UnattendedJavaActivityRequest] ERROR com.vuram.plugins.docxtohtml.DocxToHtmlConverter - Exception exporting package

  • Hi All,

    Did anyone have luck implementing this Plugin?

    I keep getting the empty file (and no file saved in the folder) regardless if I provide fonts file or not.

    Many thanks in advance?

  • v1.0.2 Release Notes
    • Security patch updated
  • Hi Mushahid the compatibility of this plugin is limited to 20.3 version we suspect the same to be cause of the issue in higher cloud evironments and we are checking with appian on the same. We have added this to our backlog and are working to upgrade the compatibility to higher version . We will share updates on this.

  • Hi Raghu,

    Did you find anything?

  • Thanks Mushahid we got the necessary inputs .We will analyze and get back soon

  • Hi Raghu,

    Please find the attached.

  • Hi this screenshot shows state of process model variables. Please check the output tab of smart service node and store the error occured error message and newdocument created there ,click on process properties -> variables and share us the screenshot of the page