DRAFT KB-XXXX Application server unresponsive following startup in a high availability and/or distributed environment running multiple application servers

Symptoms

Following a restart of the search servers and application servers in a high availability (HA) and/or distributed environment with multiple application servers, some subset of users experience an HTTP Status 404 - Not Found error when attempting to access the environment:

In the tomcat-stdOut.log for at least one of the application servers, the below error will be displayed during the application server's startup:

YYYY-MM-DD HH:MM:SS SEVERE [localhost-startStop-1] org.apache.catalina.core.StandardContext.listenerStart Exception sending context initialized event to listener instance of class [com.appiancorp.common.startup.WaitForStatefulComponentsListener]
 java.lang.reflect.UndeclaredThrowableException
... 26 more
 Caused by: MasterNotDiscoveredException[null]

No application server logging will occur following the startup.

For the same application server node where the above error message is printed, the below message is printed repeatedly in the search-server.log located in the <APPIAN_HOME>/logs/search server directory:

[WARN ][org.elasticsearch.discovery.zen.ZenDiscovery] [unknown] not enough master nodes discovered during pinging...

Followed by the below messages:

[DEBUG][org.elasticsearch.action.admin.cluster.health.TransportClusterHealthAction] [unknown] no known master node, scheduling a retry...
[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [unknown] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [none] to [RED]. Current cluster information: [status=RED]...

Cause

The application server fails to start up successfully when the search server that is hosted on the same server is running, but not in a completely healthy state. The search server falls into this unhealthy state when it has been started at least 4 minutes prior to any other search server nodes in the cluster. The unhealthy state is caused by the search server cluster being unable to elect a master node with only a single search server running. This issue has been reported to the Appian Product Team. The reference number is AN-136158.

Action

Cloud

Open a case with Appian Support and note to the case description that you are experiencing behavior in line with this article.

On-Premise

The search server cluster will automatically stabilize once all search servers have joined the cluster. Once all search servers have joined the cluster, restarting the unresponsive application server will resolve this issue. You can confirm the status of the search server cluster in in the search-server.log located in the <APPIAN_HOME>/logs/search server directory. A healthy output will be indicated by the below line from this log:

[INFO ][com.appian.es.logging.HealthStatusLoggingListener] [Node <node_subdomain>:XXXX] Cluster health changed for [cluster name=appian-search-cluster]. Status changed from [RED] to [GREEN]. Current cluster information: [status=GREEN], [timed out=false], [nodes=X], [data nodes=X], [active primary shards=X], [active shards=X], [relocating shards=X], [initializing shards=X], [unassigned shards=X]

Affected Versions

This article applies to all versions of Appian.

Last Reviewed: December 2019