KB-2038 Issues writing to MySQL database when emojis or unicode characters are used

Symptoms

When attempting to write text containing emojis and certain other unicode characters to the MySQL business database from a Write to Data Store Entity smart service, the following error is logged:

ERROR org.hibernate.util.JDBCExceptionReporter - Incorrect string value: '<EXAMPLE_STRING>' for column '<EXAMPLE_COLUMN>' at row 1
ERROR com.appiancorp.type.external.writetodatastore.WriteToDataStoreCore - Error when trying to write to the data store. com.appiancorp.suiteapi.process.exceptions.SmartServiceException: userMsg[error.duringWrite.userMsg=An error occurred while trying to write to the entity <entity> [id=<uuid>, type=<type> (id=<id>)] (data store: <data store>). Details: org.hibernate.exception.GenericJDBCException: could not insert: [<entity>]: java.sql.SQLException: Incorrect string value: '<value>' for column '<column>' at row <rownum> Data: TypedValue[it=<id>,v={<value>}

Cause

This is caused by the default character set configuration of the MySQL database not supporting the full set of unicode characters.

Action

Work with a DBA to update the default collation on the database and on existing tables and columns. The following steps show one possible solution to this for MySQL databases:

  1. Take a backup of the database, including all database configurations and settings.
  2. Check the current default character set at different levels as well as the current default collation setting for the MySQL database by running the following commands:
    select *
    from performance_schema.global_variables
    where variable_name like '%character%'
    or variable_name like '%collation%';
  3. Confirm that the default collation should in fact be changed, and that changing it will not break any intended functionality.
  4. Run the following commands to manually set the default server level character set and collation:
    set global character_set_server = 'utf8mb4';
    set global collation_server = 'utf8mb4_unicode_ci';

    Any new tables created will now inherit the default character set and collation. However, any tables that are already created with the previous character set and collation will not automatically switch their character set to the new one.

  5. To identify which tables have columns that are using a character set, run the following query:
    select table_name, column_name, ordinal_position, column_default, is_nullable, data_type, character_maximum_length, character_set_name, collation_name, column_type
    from information_schema.columns
    where character_set_name is not null and table_schema = '<schema_name>'
    order by table_name, ordinal_position;
  6. When changing a character set, it is possible to have incompatibilities between the old and the new set. Make sure that the intended character set change will not cause new problems.

    The following is an example of an incompatibility that could result from changing the character set. The character set latin1 only takes one byte per character. The maximum index key length in MySQL is 767 bytes. So a column with a latin1 character set can have an index as long as the column limits to less than 767 characters. But utf8mb4 can take up to 4 bytes per character, which means that an index can only be set on a field that is a maximum of 767/4=191 characters. The column length can be more than 191 characters but the index key length cannot be more than 191 characters. To work around this, three possible options are to:
    1. Reduce the length of the column on which there is an index to 191 characters only. Drop the index, change the character set, then recreate the index.
    2. Keep the column length as-is, but recreate the index with a maximum key length of 191 characters only. This will work if the index is not unique. It will not work if the index is unique.
    3. Do nothing about the problem if changing the collation is more costly than enabling storage of emojis.

    For more information, see the MySQL documentation for column character sets and collation.

  7. For each <table_name> that should have the collation changed, run:
    alter table <table_name> convert to character set utf8mb4;

Affected Versions

This article applies to all versions of Appian.

Last Reviewed: December 2019

Related
Recommended