KB-2096 Kafka brokers out of sync in a high availability environment

This issue has been resolved in an Appian hotfix/new Appian version. Please apply the latest hotfix to your Appian installation or upgrade to the latest version of Appian.

Symptoms

Running <APPIAN_HOME>/services/bin/status.bat (.sh) -p <password> -c in a high availability environment shows one of the Kafka brokers to be not in sync:

Kafka Broker Connectivity ----------------------------------------------------
example1.com:9092 Reachable ISR: All replicas in sync
example2.com:9092 Reachable controller ISR: Replicas not in sync
not in sync: __consumer_offsets-0
not in sync: __consumer_offsets-12
not in sync: __consumer_offsets-15
not in sync: __consumer_offsets-18
not in sync: __consumer_offsets-21
not in sync: serviceManager.transaction.download-stats-0
not in sync: serviceManager.transaction.execution01-0
not in sync: serviceManager.transaction.execution02-0
not in sync: serviceManager.transaction.portal-0
example3.com:9092 Reachable ISR: All replicas in sync

Cause

This issue has been addressed via AN-141918 in the following hotfixes/versions:

Action

Apply the latest hotfix to your Appian installation or upgrade to the latest version of Appian.

Workaround

It will be necessary to perform a rolling restart of the Kafka brokers on the site. The Kafka brokers that must be restarted one at a time are listed in <APPIAN_HOME>/conf/appian-topology.xml file.

  1. Make sure all ZooKeeper servers configured in appian-topology.xml are running. To start ZooKeeper run the following on each machine hosting ZooKeeper:
    1. <APPIAN_HOME>/services/bin/start.sh -p <password> -s zookeeper
  2. For one machine, restart the Kafka broker using the below commands:
    1. <APPIAN_HOME>/services/bin/stop.sh -p <password> -s kafka
    2. <APPIAN_HOME>/services/bin/start.sh -p <password> -s kafka

Repeat the above steps on each of the servers that are hosting Kafka, starting with the Replicas first, and the Controller last. The Controller can be identified from the output of the status script used in the Symptoms section.

Affected Versions

This article applies to Appian 19.3 and earlier.

Last Reviewed: March 2020
Related
Recommended