KB-1708 Errors related to file size thrown when using the Export to Excel/CSV smart service

Symptoms

When performing an export using the Export Data Store Entity to Excel Smart Service and providing a "Document To Update" that is an existing Excel file, one the following errors might be encountered:

The decompressed version is 100x larger than compressed version. Unable to update file <FILE_NAME>. The file size of the 'Document to Update' is at least 100 times larger when uncompressed, which constitutes a security risk.
The decompressed version has a zip entry that is >4GB. Unable to update file <FILE_NAME>. The file size of the uncompressed 'Document to Update' is too large.

Cause

  1. All Excel files exist as compressed .zip archives. During the export process, the given Excel file must be uncompressed before it can be read. As a part of this operation, the file is checked against a heuristic designed to detect zip bombs. An archive is considered a zip bomb if the uncompressed archive is at least 100 times larger than the compressed archive. In simple terms, the more that data is repeated in a file, the more it can be compressed. For example, an Excel file with the same value in a vast number of cells, or the same style repeated over and over may take up a decent amount of space in the uncompressed Excel file, but a very small amount of space in the compressed Excel file. The more that this occurs in the file, the more likely it is for the file to be over 100 times larger when uncompressed.
  2. A .xlsx file when uncompressed from its .zip archive form will become substantially larger than it appears on the origin filesystem. To restrict the amount of memory that will be consumed during decompression, as soon as one of the child files within the zip archive is detected to be larger than 4GB, the process will be terminated. This can occur on large .xlsx documents (40MB and greater) that are have a very large amount of data, or do not have a high enough uncompressed/compressed size ratio such that #1 will occur, but still has enough redundant data, styles, or other metadata such that the file inflates to the 4GB threshold.

Action

Ensure that the Excel file is not a zip bomb. Inspect the Excel file for duplicated data, styles, or other metadata that may contribute to this issue and re-save the file before using it with this smart service. If a file has all formatting removed and the error persists, then the file is too large to be updated.

Affected Versions

This article applies to Appian 18.2 and later.

Last Reviewed: April 2018

Related
Recommended