Docker data-server container does not start properly

A Score Level 2

When I start the docker containers, all the containers change to "healthy" once they are up and running, except for the data-server container. This is preventing the web-application container from starting successfully. I've attached the logs of the data-server container. There is no error thrown by the data-server immediately. It sits idle for about 15 minutes. Then the status changes to "unhealthy". Any ideas how to troubleshoot this?

In the screenshot I started the rdbms, search-server, service-manager, and data-server containers in that order.

Appian: 19.3

$ docker-compose up data-server
Starting base ... done
service-manager is up-to-date
Starting data-server ... done
Attaching to data-server
data-server        | Waiting for these containers to start: service-manager
data-server        | Currently waiting for service-manager
data-server        | service-manager is up!
data-server        | Executing start command: /usr/local/appian/ae/data-server/bin/start.sh -f
data-server        | 0 [main] INFO com.appian.data.TopologyFactory  - Loading topology /usr/local/appian/ae/data-server/conf/appian-topology.xml
data-server        | 451 [main] INFO com.appian.data.server.config.CustomPropertiesConfiguration  - Configuration file custom.properties was loaded successfully. (Reload check = 10000 ms)
data-server        | 478 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/data'
data-server        | 484 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/data'
data-server        | 486 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/node'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/node'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/logs'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/logs'
data-server        | 490 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/data-metrics'
data-server        | 490 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/data-metrics'
data-server        | 495 [main] INFO com.appian.data.server.Watchdog  - Initializing log4j (processName=watchdog, logsDir=/usr/local/appian/ae/data-server/logs) from /usr/local/appian/ae/data-server/conf/log4j.properties
data-server        | {"level":"INFO","time":"2019-10-14T22:15:11.567Z","logger":"watchdog","caller":"Watchdog.java:1007","thread":"main","msg":"Starting Data Server. Build-Version[0.148.4], Build-Date[2019-08-16], Target-Version[2019-04-21]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.099Z","logger":"watchdog","caller":"Logging.java:53","thread":"main","msg":"Total memory: 81264640, Max memory: 119537664, Free memory: 61261104, Num processors: 2"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.101Z","logger":"watchdog","caller":"Logging.java:56","thread":"main","msg":"JVM version: 1.8.0_222, JVM arguments: -Duser.timezone=GMT -Dds.home=/usr/local/appian/ae/data-server -Xms64m -Xmx128m, JVM current dir: /usr/local/appian/ae"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.238Z","logger":"watchdog","caller":"VertxServer.java:105","thread":"vert.x-eventloop-thread-3","msg":"Listening on data-server:5400..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.241Z","logger":"watchdog","caller":"VertxServer.java:105","thread":"vert.x-eventloop-thread-0","msg":"Listening on data-server:6400..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.246Z","logger":"watchdog","caller":"MigrationEngine.java:122","thread":"main","msg":"Version on disk (2019-04-21) is verified. No migration required..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.255Z","logger":"watchdog","caller":"TopicManager.java:181","thread":"main","msg":"Topic ads_transaction_effects already exists"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.255Z","logger":"watchdog","caller":"TopicManager.java:281","thread":"main","msg":"Ensuring topic "ads_transaction_effects" has the correct configuration"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.379Z","logger":"watchdog","caller":"TopicManager.java:298","thread":"main","msg":"Topic "ads_transaction_effects" has the correct configuration"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.379Z","logger":"watchdog","caller":"TopicManager.java:168","thread":"main","msg":"Active brokers: [engines:9092 (id: 0 rack: null)]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.383Z","logger":"watchdog","caller":"TopicManager.java:342","thread":"main","msg":"Topic ads_transaction_effects has desired assigned replicas: [engines:9092 (id: 0 rack: null)]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.445Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'appender'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.503Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'hs'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.508Z","logger":"watchdog","caller":"AppenderStarter.java:39","thread":"process-starter-0","msg":"Starting appender engine (port=5405)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.510Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-0","msg":"Starting Q process: Config{script=appender.q, hostAndPort=data-server:5405, args=[]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.517Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'rts:0'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.523Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:38","thread":"process-starter-1","msg":"Starting HS engine (port=5403)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.524Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-1","msg":"Starting Q process: Config{script=hs.q, hostAndPort=data-server:5403, args=[]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.593Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:40","thread":"process-starter-2","msg":"Starting RTS-0 engine (port=5451)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.595Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-2","msg":"Starting Q process: Config{script=rts.q, hostAndPort=data-server:5451, args=[-rts, 0]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.166Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:40","thread":"process-starter-1","msg":"Started HS engine (port=5403)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.171Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:74","thread":"process-starter-1","msg":"Starting HS gateway (port=5401)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.172Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-1","msg":"Starting Go process: Config{executable=ads-hs-linux, hostAndPort=data-server:5401, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.host, data-server, -gw.port, 5401, -gw.metrics.port, 6401, -hs.peers, data-server:5401, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -q.port, 5403]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.249Z","logger":"watchdog","caller":"AppenderStarter.java:41","thread":"process-starter-0","msg":"Started appender engine (port=5405)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.251Z","logger":"watchdog","caller":"AppenderStarter.java:76","thread":"process-starter-0","msg":"Starting appender gateway (port=5404)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.252Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-0","msg":"Starting Go process: Config{executable=ads-appender-linux, hostAndPort=data-server:5404, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.port, 5404, -gw.host, data-server, -gw.metrics.port, 6404, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -engine.port, 5405, -hs.engine.port, 5403, -hs.gw.address, data-server:5401, -rts.gw.addresses, data-server:5450]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.259Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:42","thread":"process-starter-2","msg":"Started RTS-0 engine (port=5451)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.263Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:77","thread":"process-starter-2","msg":"Starting RTS-0 gateway (port=5450)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.264Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-2","msg":"Starting Go process: Config{executable=ads-rts-linux, hostAndPort=data-server:5450, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.host, data-server, -gw.port, 5450, -gw.metrics.port, 6450, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -q.port, 5451, -rts, 0]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:16.446Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:76","thread":"process-starter-1","msg":"Started HS gateway (port=5401)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:16.559Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:79","thread":"process-starter-2","msg":"Started RTS-0 gateway (port=5450)."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.490Z","logger":"watchdog","caller":"AppenderStarter.java:78","thread":"process-starter-0","msg":"Started appender gateway (port=5404)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.496Z","logger":"watchdog","caller":"Watchdog.java:451","thread":"heal-executor","msg":"Healing is enabled. To disable, set property "wd.healInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a non-positive number ..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.519Z","logger":"watchdog","caller":"ResourceMetricsCollection.java:78","thread":"resource-metrics-collector","msg":"Resource metrics capturing is enabled. To disable, set property "wd.resourceMetricsInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a non-positive number ..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.521Z","logger":"watchdog","caller":"BranchCleanup.java:67","thread":"branch-cleaner","msg":"Branch cleanup is enabled. To disable, set property "bg.branchCleanupInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a negative number..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.522Z","logger":"watchdog","caller":"Watchdog.java:252","thread":"main","msg":"Ready in 5422 ms"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.523Z","logger":"watchdog","caller":"LogRotation.java:57","thread":"log-rotator","msg":"LogRotation is enabled. To disable, set property "wd.log.rotation.interval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a negative number"}
data-server        | {"level":"INFO","time":"2019-10-14T22:30:36.937Z","logger":"watchdog","caller":"BranchCleanup.java:39","thread":"branch-cleaner","msg":"Starting branch cleanup..."}
kdata-server        | Oct 14, 2019 10:34:39 PM io.vertx.core.impl.BlockedThreadChecker
data-server        | WARNING: Thread Thread[vert.x-eventloop-thread-2,5,main] has been blocked for 3428 ms, time limit is 2000 ms
kdata-server        | {"level":"INFO","time":"2019-10-14T22:35:12.618Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"heal-executor","msg":"Killing data server component 'appender'"}
data-server        | Oct 14, 2019 10:35:47 PM io.vertx.core.impl.BlockedThreadChecker
data-server        | WARNING: Thread Thread[vert.x-eventloop-thread-3,5,main] has been blocked for 2961 ms, time limit is 2000 ms
data-server        | {"level":"ERROR","time":"2019-10-14T22:36:24.862Z","logger":"watchdog","caller":"Watchdog.java:437","thread":"heal-executor","msg":"Failed to heal components... Cause: "}
data-server        | java.lang.IllegalStateException: Unable to kill data-server component using command: [/usr/local/appian/ae/data-server/lib/scripts/kill-component.sh, appender]
data-server        | 	at com.appian.data.server.ScriptUtils.killComponent(ScriptUtils.java:73)
data-server        | 	at com.appian.data.server.Watchdog.heal(Watchdog.java:463)
data-server        | 	at com.appian.data.server.Watchdog.access$200(Watchdog.java:99)
data-server        | 	at com.appian.data.server.Watchdog$1.onInterval(Watchdog.java:432)
data-server        | 	at com.appian.data.server.IntervalTask.safeRun(IntervalTask.java:76)
data-server        | 	at com.appian.data.server.IntervalTask.run(IntervalTask.java:62)
data-server        | 	at java.lang.Thread.run(Thread.java:748)
data-server        | Caused by: java.util.concurrent.TimeoutException: Timed out waiting for java.lang.UNIXProcess@647bd949 to finish, timeout: 15000 milliseconds, executed command [/usr/local/appian/ae/data-server/lib/scripts/kill-component.sh, appender]
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.newTimeoutException(ProcessExecutor.java:1079)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.waitFor(ProcessExecutor.java:1042)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:860)
data-server        | 	at com.appian.data.server.ScriptUtils.killComponent(ScriptUtils.java:71)
data-server        | 	... 6 more
data-server        | {"level":"ERROR","time":"2019-10-14T22:36:50.643Z","logger":"watchdog","caller":"ResourceMetricsCollection.java:63","thread":"resource-metrics-collector","msg":"Failed to retrieve resource metrics... Cause: "}
data-server        | java.lang.IllegalStateException: Unable to capture resource metrics using [ps, -o, pid,%cpu,%mem,cpu,rss, -p, 106, -p, 88, -p, 73, -p, 79, -p, 96, -p, 10, -p, 107]
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.getProcessMetrics(ResourceMetricsCollection.java:136)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.captureResourceMetrics(ResourceMetricsCollection.java:88)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.onInterval(ResourceMetricsCollection.java:61)
data-server        | 	at com.appian.data.server.IntervalTask.safeRun(IntervalTask.java:76)
data-server        | 	at com.appian.data.server.IntervalTask.run(IntervalTask.java:62)
data-server        | 	at java.lang.Thread.run(Thread.java:748)
data-server        | Caused by: java.util.concurrent.TimeoutException: Timed out waiting for java.lang.UNIXProcess@7dcd453a to finish, timeout: 15000 milliseconds, executed command [ps, -o, pid,%cpu,%mem,cpu,rss, -p, 106, -p, 88, -p, 73, -p, 79, -p, 96, -p, 10, -p, 107], output was 0 bytes:
data-server        | 
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.newTimeoutException(ProcessExecutor.java:1079)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.waitFor(ProcessExecutor.java:1042)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:860)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.getProcessMetrics(ResourceMetricsCollection.java:132)
data-server        | 	... 5 more

Note: I can start the web-application container even though the data-server container status never changes to "healthy". The web-application thinks the data-server has completed its startup script. Timeouts immediately occur.

Parents Reply Children
No Data

 Discussion posts and replies are publicly visible