DRAFT KB-XXXX Application server unresponsive following startup in an HA/Distributed environment running multiple application servers

Symptoms

Following a restart of the search servers and application servers in an HA/Distributed environment with multiple application servers, some subset of users experience an HTTP Status 404 - Not Found error when attempting to access the environment:

In the application server logs for at least one of the application servers, the below error will be displayed during the application server's startup:

YYYY-MM-DD HH:MM:SS SEVERE [localhost-startStop-X] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [com.appiancorp.common.startup.WaitForStatefulComponentsListener]
 java.lang.reflect.UndeclaredThrowableException

... 26 more
 Caused by: MasterNotDiscoveredException[null]

No application server logging will occur following the startup.

For the same application server node where the above error message is printed, the below message is printed repeatedly in the search server.log located in the <APPIAN_HOME>/logs/search server directory:

[WARN ][org.elasticsearch.discovery.zen.ZenDiscovery] [unknown] not enough master nodes discovered during pinging...

Followed by the below messages:

[DEBUG][org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction] [unknown] no known master node, scheduling a retry...
[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [unknown] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [none] to [RED]. Current cluster information: [status=RED]...

Cause

This behavior results from the Appian Web application failing to successfully deploy into Tomcat. This issue occurs during application server startup when the search server for the same application server node is not in a completely healthy state. The search server falls into this unhealthy state when one search server is started 4 minutes prior to any other search server nodes being started in the cluster. This search server that was started earlier than other nodes in the cluster will be running, but will be in an unhealthy status, as the cluster is unable to elect a master node with only a single search server running. This issue has been reported to the Appian Product Team. The reference number is AN-136158.

Action

The search server cluster will automatically stabilize once all search servers have joined the cluster. At this point, restarting the unresponsive application server will resolve this issue.

Affected Versions

This article applies to all versions of Appian.

Last Reviewed: December 2019