KB-2355 One or more Zookeeper node(s) fail to start after applying hotfix

Symptoms

After deploying a hotfix released after September 18, 2025 where Zookeeper upgrades from 3.9.3 to 3.9.4 (found in <APPIAN_HOME>/services/) on a High Availability (HA) site, Zookeeper fails to start on one or two nodes.

On the affected node, the zookeeper-XXX.log (located in <APPIAN HOME>/logs/service-manager/zookeeper) indicates that the node is unable to startup due to a corrupted or inconsistent state.

YYYY-MM-DD HH:MM:SS.mmm [myid:0] - ERROR [main:o.a.z.s.q.QuorumPeerMain@114] - Unexpected exception, exiting abnormally
java.lang.IllegalStateException: Committed proposal cached out of order: 0x5000000xxx is not the next proposal of 0x5000000xxx
 at org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:341)
 at org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:313)
 at org.apache.zookeeper.server.ZKDatabase.access$000(ZKDatabase.java:73)
 at org.apache.zookeeper.server.ZKDatabase$1.onTxnLoaded(ZKDatabase.java:278)
 at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:360)
 at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)
 at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)
 at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:290)
 at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1149)
 at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1135)
 at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:229)
 at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137)
 at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
YYYY-MM-DD HH:MM:SS.mmm [myid:0] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 1

For Appian on Kubernetes, the affected Zookeeper pod(s) are stuck on CRASHLOOPBACKOFF.
For legacy self-managed Appian, starting engines show the following on affected node(s).
```
Waiting for Zookeeper to be available 
```

Cause

These errors are caused by the upgrade from Zookeeper 3.9.3 to Zookeeper 3.9.4, which detects a product bug that had gone unnoticed in an earlier version.

Action

Appian on Kubernetes

Set the impacted Zookeeper pod to sleep.
- kubectl -n <namespace> patch statefulset <ZOOKEEPER STS> --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["sleep", "infinity"]}]'
Restart the affected Zookeeper pod.
- kubectl -n <namespace> delete pod <ZOOKEEPER POD>
Exec into the Zookeeper pod.
- kubectl -n <namespace> exec -it <ZOOKEEPER POD> -- bash
Delete the contents of zookeeper/data on affected node
- rm -rf zookeeper/data/*
Remove the sleep command.
- kubectl -n <namespace> patch statefulset <ZOOKEEPER STS> --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/command"}]'
Restart the Zookeeper pod
- kubectl -n <namespace> delete pod <ZOOKEEPER POD>
If there is another affected pod, repeat the steps above to remediate.

Legacy Self-Managed Appian

Stop all Appian components gracefully
Delete the contents of Zookeeper's data only on the affected node(s).
- rm -rf <APPIAN HOME>/services/data/zookeeper/*
Start all Appian components

Workaround

Restore the site to the previous version.

Affected Versions

This article applies to all High Availability (HA) versions of Appian applying a hotfix released on September 18, 2025 and later.

Last Reviewed: November 2025