KB-2041 Application server unresponsive following startup in a high availability and/or distributed environment with multiple application servers

This issue has been resolved in an Appian hotfix/new Appian version. Please apply the latest hotfix to your Appian installation or upgrade to the latest version of Appian.

Symptoms

Following a restart of the search servers and application servers in a high availability (HA) and/or distributed environment with multiple application servers, some subset of users experience an HTTP Status 404 - Not Found error when attempting to access the environment.

In the tomcat-stdOut.log for at least one of the application servers, the below error will be displayed during the application server's startup:

YYYY-MM-DD HH:MM:SS SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [com.appiancorp.common.startup.WaitForStatefulComponentsListener]
 java.lang.reflect.UndeclaredThrowableException
...
 Caused by: MasterNotDiscoveredException[null]

The application server logs also show that the application server has started. No application server logging will occur following the startup.

For the same application server node where the above error message is printed, the following message is printed repeatedly in the search-server.log located in the <APPIAN_HOME>/logs/search-server directory:

[WARN ][org.elasticsearch.discovery.zen.ZenDiscovery] [unknown] not enough master nodes discovered during pinging...

Followed by:

[DEBUG][org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction] [unknown] no known master node, scheduling a retry...
[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [unknown] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [none] to [RED]. Current cluster information: [status=RED]...

Cause

The Appian web application fails to successfully deploy into Tomcat when the search server that is hosted on the same server is running, but not in a completely healthy state. The search server falls into this unhealthy state when it has been started at least 4 minutes prior to any other search server nodes in the cluster. The unhealthy state is caused by the search server cluster being unable to elect a primary node with only a single search server running. This issue has been addressed via AN-146092 in the following hotfixes/versions:

Action

Upgrade to the latest version of Appian.

Workaround

The search server cluster will automatically stabilize once all search servers have joined the cluster. Once all search servers have joined the cluster, restarting the unresponsive application server will resolve this issue. Confirm the status of the search server cluster in the search-server.log located in the <APPIAN_HOME>/logs/search-server directory. A healthy output is indicated by the cluster status turning GREEN:

[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [Node <node_subdomain>:XXXX] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [RED] to [GREEN]. Current cluster information: [status=GREEN], [timed out=false], [nodes=X], [data nodes=X], [active primary shards=X], [active shards=X], [relocating shards=X], [initializing shards=X], [unassigned shards=X]

Affected Versions

This article applies to all versions of Appian.

Last Reviewed: June 2020