KB-2355 One or more Zookeeper node(s) fail to start after applying hotfix

Symptoms

After deploying a hotfix released after September 18, 2025 where Zookeeper upgrades from 3.9.3 to 3.9.4 (found in <APPIAN_HOME>/services/) on a High Availability (HA) site, Zookeeper fails to start on one or two nodes.

  • On the affected node, the zookeeper-XXX.log (located in <APPIAN HOME>/logs/service-manager/zookeeper) indicates that the node is unable to startup due to a corrupted or inconsistent state.
    YYYY-MM-DD HH:MM:SS.mmm [myid:0] - ERROR [main:o.a.z.s.q.QuorumPeerMain@114] - Unexpected exception, exiting abnormally
    java.lang.IllegalStateException: Committed proposal cached out of order: 0x5000000xxx is not the next proposal of 0x5000000xxx
    at org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:341)
    at org.apache.zookeeper.server.ZKDatabase.addCommittedProposal(ZKDatabase.java:313)
    at org.apache.zookeeper.server.ZKDatabase.access$000(ZKDatabase.java:73)
    at org.apache.zookeeper.server.ZKDatabase$1.onTxnLoaded(ZKDatabase.java:278)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.fastForwardFromEdits(FileTxnSnapLog.java:360)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.lambda$restore$0(FileTxnSnapLog.java:267)
    at org.apache.zookeeper.server.persistence.FileTxnSnapLog.restore(FileTxnSnapLog.java:312)
    at org.apache.zookeeper.server.ZKDatabase.loadDataBase(ZKDatabase.java:290)
    at org.apache.zookeeper.server.quorum.QuorumPeer.loadDataBase(QuorumPeer.java:1149)
    at org.apache.zookeeper.server.quorum.QuorumPeer.start(QuorumPeer.java:1135)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:229)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:137)
    at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:91)
    YYYY-MM-DD HH:MM:SS.mmm [myid:0] - ERROR [main:o.a.z.u.ServiceUtils@48] - Exiting JVM with code 1
  • For Appian on Kubernetes, the affected Zookeeper pod(s) are stuck on CRASHLOOPBACKOFF.
  • For legacy self-managed Appian, starting engines show the following on affected node(s).
    Waiting for Zookeeper to be available 

Cause

These errors are caused by the upgrade from Zookeeper 3.9.3 to Zookeeper 3.9.4, which detects a product bug that had gone unnoticed in an earlier version.

Action

Appian on Kubernetes

  1. Set the impacted Zookeeper pod to sleep.
    • kubectl -n <namespace> patch statefulset <ZOOKEEPER STS> --type='json' -p='[{"op": "add", "path": "/spec/template/spec/containers/0/command", "value": ["sleep", "infinity"]}]'
  2. Restart the affected Zookeeper pod.
    • kubectl -n <namespace> delete pod <ZOOKEEPER POD>
  3. Exec into the Zookeeper pod.
    • kubectl -n <namespace> exec -it <ZOOKEEPER POD> -- bash
  4. Delete the contents of zookeeper/data on affected node
    • rm -rf zookeeper/data/*
  5. Remove the sleep command.  
    • kubectl -n <namespace> patch statefulset <ZOOKEEPER STS> --type='json' -p='[{"op": "remove", "path": "/spec/template/spec/containers/0/command"}]' 
  6. Restart the Zookeeper pod 
    • kubectl -n <namespace> delete pod <ZOOKEEPER POD>
  7. If there is another affected pod, repeat the steps above to remediate.

Legacy Self-Managed Appian

  1. Stop all Appian components gracefully
  2. Delete the contents of Zookeeper's data only on the affected node(s).
    • rm -rf <APPIAN HOME>/services/data/zookeeper/*
  3. Start all Appian components

Workaround

Restore the site to the previous version.

Affected Versions

This article applies to all High Availability (HA) versions of Appian applying a hotfix released on September 18, 2025 and later.

Last Reviewed: November 2025

Related
Recommended