Docker data-server container does not start properly

When I start the docker containers, all the containers change to "healthy" once they are up and running, except for the data-server container. This is preventing the web-application container from starting successfully. I've attached the logs of the data-server container. There is no error thrown by the data-server immediately. It sits idle for about 15 minutes. Then the status changes to "unhealthy". Any ideas how to troubleshoot this?

In the screenshot I started the rdbms, search-server, service-manager, and data-server containers in that order.

Appian: 19.3

$ docker-compose up data-server
Starting base ... done
service-manager is up-to-date
Starting data-server ... done
Attaching to data-server
data-server        | Waiting for these containers to start: service-manager
data-server        | Currently waiting for service-manager
data-server        | service-manager is up!
data-server        | Executing start command: /usr/local/appian/ae/data-server/bin/start.sh -f
data-server        | 0 [main] INFO com.appian.data.TopologyFactory  - Loading topology /usr/local/appian/ae/data-server/conf/appian-topology.xml
data-server        | 451 [main] INFO com.appian.data.server.config.CustomPropertiesConfiguration  - Configuration file custom.properties was loaded successfully. (Reload check = 10000 ms)
data-server        | 478 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/data'
data-server        | 484 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/data'
data-server        | 486 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/node'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/node'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/logs'
data-server        | 488 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/logs'
data-server        | 490 [main] INFO com.appian.data.server.Watchdog  - Creating directory '/usr/local/appian/ae/data-server/data-metrics'
data-server        | 490 [main] INFO com.appian.data.server.Watchdog  - Checking permissions on directory '/usr/local/appian/ae/data-server/data-metrics'
data-server        | 495 [main] INFO com.appian.data.server.Watchdog  - Initializing log4j (processName=watchdog, logsDir=/usr/local/appian/ae/data-server/logs) from /usr/local/appian/ae/data-server/conf/log4j.properties
data-server        | {"level":"INFO","time":"2019-10-14T22:15:11.567Z","logger":"watchdog","caller":"Watchdog.java:1007","thread":"main","msg":"Starting Data Server. Build-Version[0.148.4], Build-Date[2019-08-16], Target-Version[2019-04-21]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.099Z","logger":"watchdog","caller":"Logging.java:53","thread":"main","msg":"Total memory: 81264640, Max memory: 119537664, Free memory: 61261104, Num processors: 2"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.101Z","logger":"watchdog","caller":"Logging.java:56","thread":"main","msg":"JVM version: 1.8.0_222, JVM arguments: -Duser.timezone=GMT -Dds.home=/usr/local/appian/ae/data-server -Xms64m -Xmx128m, JVM current dir: /usr/local/appian/ae"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.238Z","logger":"watchdog","caller":"VertxServer.java:105","thread":"vert.x-eventloop-thread-3","msg":"Listening on data-server:5400..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:12.241Z","logger":"watchdog","caller":"VertxServer.java:105","thread":"vert.x-eventloop-thread-0","msg":"Listening on data-server:6400..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.246Z","logger":"watchdog","caller":"MigrationEngine.java:122","thread":"main","msg":"Version on disk (2019-04-21) is verified. No migration required..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.255Z","logger":"watchdog","caller":"TopicManager.java:181","thread":"main","msg":"Topic ads_transaction_effects already exists"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.255Z","logger":"watchdog","caller":"TopicManager.java:281","thread":"main","msg":"Ensuring topic "ads_transaction_effects" has the correct configuration"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.379Z","logger":"watchdog","caller":"TopicManager.java:298","thread":"main","msg":"Topic "ads_transaction_effects" has the correct configuration"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.379Z","logger":"watchdog","caller":"TopicManager.java:168","thread":"main","msg":"Active brokers: [engines:9092 (id: 0 rack: null)]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.383Z","logger":"watchdog","caller":"TopicManager.java:342","thread":"main","msg":"Topic ads_transaction_effects has desired assigned replicas: [engines:9092 (id: 0 rack: null)]"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.445Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'appender'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.503Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'hs'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.508Z","logger":"watchdog","caller":"AppenderStarter.java:39","thread":"process-starter-0","msg":"Starting appender engine (port=5405)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.510Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-0","msg":"Starting Q process: Config{script=appender.q, hostAndPort=data-server:5405, args=[]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.517Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"main","msg":"Killing data server component 'rts:0'"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.523Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:38","thread":"process-starter-1","msg":"Starting HS engine (port=5403)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.524Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-1","msg":"Starting Q process: Config{script=hs.q, hostAndPort=data-server:5403, args=[]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.593Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:40","thread":"process-starter-2","msg":"Starting RTS-0 engine (port=5451)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:13.595Z","logger":"watchdog","caller":"QStarter.java:53","thread":"process-starter-2","msg":"Starting Q process: Config{script=rts.q, hostAndPort=data-server:5451, args=[-rts, 0]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.166Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:40","thread":"process-starter-1","msg":"Started HS engine (port=5403)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.171Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:74","thread":"process-starter-1","msg":"Starting HS gateway (port=5401)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.172Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-1","msg":"Starting Go process: Config{executable=ads-hs-linux, hostAndPort=data-server:5401, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.host, data-server, -gw.port, 5401, -gw.metrics.port, 6401, -hs.peers, data-server:5401, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -q.port, 5403]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.249Z","logger":"watchdog","caller":"AppenderStarter.java:41","thread":"process-starter-0","msg":"Started appender engine (port=5405)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.251Z","logger":"watchdog","caller":"AppenderStarter.java:76","thread":"process-starter-0","msg":"Starting appender gateway (port=5404)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.252Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-0","msg":"Starting Go process: Config{executable=ads-appender-linux, hostAndPort=data-server:5404, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.port, 5404, -gw.host, data-server, -gw.metrics.port, 6404, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -engine.port, 5405, -hs.engine.port, 5403, -hs.gw.address, data-server:5401, -rts.gw.addresses, data-server:5450]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.259Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:42","thread":"process-starter-2","msg":"Started RTS-0 engine (port=5451)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.263Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:77","thread":"process-starter-2","msg":"Starting RTS-0 gateway (port=5450)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:15.264Z","logger":"watchdog","caller":"GoStarter.java:63","thread":"process-starter-2","msg":"Starting Go process: Config{executable=ads-rts-linux, hostAndPort=data-server:5450, args=[-ds.home, /usr/local/appian/ae/data-server, -custom.properties, /usr/local/appian/ae/data-server/conf/custom.properties, -gw.host, data-server, -gw.port, 5450, -gw.metrics.port, 6450, -kfk.brokers, engines:9092, -kfk.topic, ads_transaction_effects, -q.port, 5451, -rts, 0]}"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:16.446Z","logger":"watchdog","caller":"HistoricalStoreStarter.java:76","thread":"process-starter-1","msg":"Started HS gateway (port=5401)"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:16.559Z","logger":"watchdog","caller":"RealtimeStoreStarter.java:79","thread":"process-starter-2","msg":"Started RTS-0 gateway (port=5450)."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.490Z","logger":"watchdog","caller":"AppenderStarter.java:78","thread":"process-starter-0","msg":"Started appender gateway (port=5404)..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.496Z","logger":"watchdog","caller":"Watchdog.java:451","thread":"heal-executor","msg":"Healing is enabled. To disable, set property "wd.healInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a non-positive number ..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.519Z","logger":"watchdog","caller":"ResourceMetricsCollection.java:78","thread":"resource-metrics-collector","msg":"Resource metrics capturing is enabled. To disable, set property "wd.resourceMetricsInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a non-positive number ..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.521Z","logger":"watchdog","caller":"BranchCleanup.java:67","thread":"branch-cleaner","msg":"Branch cleanup is enabled. To disable, set property "bg.branchCleanupInterval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a negative number..."}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.522Z","logger":"watchdog","caller":"Watchdog.java:252","thread":"main","msg":"Ready in 5422 ms"}
data-server        | {"level":"INFO","time":"2019-10-14T22:15:17.523Z","logger":"watchdog","caller":"LogRotation.java:57","thread":"log-rotator","msg":"LogRotation is enabled. To disable, set property "wd.log.rotation.interval.seconds" in file "/usr/local/appian/ae/data-server/conf/custom.properties" to a negative number"}
data-server        | {"level":"INFO","time":"2019-10-14T22:30:36.937Z","logger":"watchdog","caller":"BranchCleanup.java:39","thread":"branch-cleaner","msg":"Starting branch cleanup..."}
kdata-server        | Oct 14, 2019 10:34:39 PM io.vertx.core.impl.BlockedThreadChecker
data-server        | WARNING: Thread Thread[vert.x-eventloop-thread-2,5,main] has been blocked for 3428 ms, time limit is 2000 ms
kdata-server        | {"level":"INFO","time":"2019-10-14T22:35:12.618Z","logger":"watchdog","caller":"ScriptUtils.java:64","thread":"heal-executor","msg":"Killing data server component 'appender'"}
data-server        | Oct 14, 2019 10:35:47 PM io.vertx.core.impl.BlockedThreadChecker
data-server        | WARNING: Thread Thread[vert.x-eventloop-thread-3,5,main] has been blocked for 2961 ms, time limit is 2000 ms
data-server        | {"level":"ERROR","time":"2019-10-14T22:36:24.862Z","logger":"watchdog","caller":"Watchdog.java:437","thread":"heal-executor","msg":"Failed to heal components... Cause: "}
data-server        | java.lang.IllegalStateException: Unable to kill data-server component using command: [/usr/local/appian/ae/data-server/lib/scripts/kill-component.sh, appender]
data-server        | 	at com.appian.data.server.ScriptUtils.killComponent(ScriptUtils.java:73)
data-server        | 	at com.appian.data.server.Watchdog.heal(Watchdog.java:463)
data-server        | 	at com.appian.data.server.Watchdog.access$200(Watchdog.java:99)
data-server        | 	at com.appian.data.server.Watchdog$1.onInterval(Watchdog.java:432)
data-server        | 	at com.appian.data.server.IntervalTask.safeRun(IntervalTask.java:76)
data-server        | 	at com.appian.data.server.IntervalTask.run(IntervalTask.java:62)
data-server        | 	at java.lang.Thread.run(Thread.java:748)
data-server        | Caused by: java.util.concurrent.TimeoutException: Timed out waiting for java.lang.UNIXProcess@647bd949 to finish, timeout: 15000 milliseconds, executed command [/usr/local/appian/ae/data-server/lib/scripts/kill-component.sh, appender]
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.newTimeoutException(ProcessExecutor.java:1079)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.waitFor(ProcessExecutor.java:1042)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:860)
data-server        | 	at com.appian.data.server.ScriptUtils.killComponent(ScriptUtils.java:71)
data-server        | 	... 6 more
data-server        | {"level":"ERROR","time":"2019-10-14T22:36:50.643Z","logger":"watchdog","caller":"ResourceMetricsCollection.java:63","thread":"resource-metrics-collector","msg":"Failed to retrieve resource metrics... Cause: "}
data-server        | java.lang.IllegalStateException: Unable to capture resource metrics using [ps, -o, pid,%cpu,%mem,cpu,rss, -p, 106, -p, 88, -p, 73, -p, 79, -p, 96, -p, 10, -p, 107]
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.getProcessMetrics(ResourceMetricsCollection.java:136)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.captureResourceMetrics(ResourceMetricsCollection.java:88)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.onInterval(ResourceMetricsCollection.java:61)
data-server        | 	at com.appian.data.server.IntervalTask.safeRun(IntervalTask.java:76)
data-server        | 	at com.appian.data.server.IntervalTask.run(IntervalTask.java:62)
data-server        | 	at java.lang.Thread.run(Thread.java:748)
data-server        | Caused by: java.util.concurrent.TimeoutException: Timed out waiting for java.lang.UNIXProcess@7dcd453a to finish, timeout: 15000 milliseconds, executed command [ps, -o, pid,%cpu,%mem,cpu,rss, -p, 106, -p, 88, -p, 73, -p, 79, -p, 96, -p, 10, -p, 107], output was 0 bytes:
data-server        | 
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.newTimeoutException(ProcessExecutor.java:1079)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.waitFor(ProcessExecutor.java:1042)
data-server        | 	at org.zeroturnaround.exec.ProcessExecutor.execute(ProcessExecutor.java:860)
data-server        | 	at com.appian.data.server.ResourceMetricsCollection.getProcessMetrics(ResourceMetricsCollection.java:132)
data-server        | 	... 5 more

Note: I can start the web-application container even though the data-server container status never changes to "healthy". The web-application thinks the data-server has completed its startup script. Timeouts immediately occur.

  Discussion posts and replies are publicly visible

Parents
  • 0
    Certified Lead Developer

    I'm still seeing the issue you described even with the increased docker memory. Any Idea on anything else you might have changed?

    My install is using Appian Docker Tools 19.2-1 and setupLinux64_appian-19.3.81.0.bin (hotfix C).

    On a sidenote, I did notice inconsistencies with the example hostname for data-server. In some places it was data-server.example.com and in others it was dataserver.example.com. I made them all uniform using my personal hostname. 

Reply
  • 0
    Certified Lead Developer

    I'm still seeing the issue you described even with the increased docker memory. Any Idea on anything else you might have changed?

    My install is using Appian Docker Tools 19.2-1 and setupLinux64_appian-19.3.81.0.bin (hotfix C).

    On a sidenote, I did notice inconsistencies with the example hostname for data-server. In some places it was data-server.example.com and in others it was dataserver.example.com. I made them all uniform using my personal hostname. 

Children
No Data