DRAFT KB-XXXX Application server unresponsive following startup in an HA/Distributed environment running multiple application servers

Symptoms

Following a restart of the search servers and application servers in an HA/Distributed environment with multiple application servers, some subset of users experience an HTTP Status 404 - Not Found error when attempting to access the environment:

In the tomcat-stdOut.log for at least one of the application servers, the below error will be displayed during the application server's startup:

YYYY-MM-DD HH:MM:SS SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [com.appiancorp.common.startup.WaitForStatefulComponentsListener]
 java.lang.reflect.UndeclaredThrowableException

... 26 more
 Caused by: MasterNotDiscoveredException[null]

No application server logging will occur following the startup.

For the same application server node where the above error message is printed, the below message is printed repeatedly in the search-server.log located in the <APPIAN_HOME>/logs/search server directory:

[WARN ][org.elasticsearch.discovery.zen.ZenDiscovery] [unknown] not enough master nodes discovered during pinging...

Followed by the below messages:

[DEBUG][org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction] [unknown] no known master node, scheduling a retry...
[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [unknown] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [none] to [RED]. Current cluster information: [status=RED]...

Cause

This behavior results from the Appian web application failing to successfully deploy into Tomcat. This deployment failure occurs during application server startup when the search server for the same application server node is running, but not in a completely healthy state. The search server falls into this unhealthy state when it has been started at least 4 minutes prior to any other search server nodes in the cluster. The unhealthy state is caused by the search server cluster being unable to elect a master node with only a single search server running. This issue has been reported to the Appian Product Team. The reference number is AN-136158.

Action

The search server cluster will automatically stabilize once all search servers have joined the cluster. Once all search servers have joined the cluster, restarting the unresponsive application server will resolve this issue.

Affected Versions

This article applies to all versions of Appian.

Last Reviewed: December 2019