process-design engine failure - Exception appending KRequest

Hi,

 

We have an issue with the process-design engine which is failing frequently during the automatic checkpoint by Appian. Appian detect the failure and try to restart the process and the linux process for this engine is killed during that process. After restart, Zookeeper fail to start and we are not able to connect to Appian. In order to get everything back I need to start Zookeeper and process-design manually.

We have Appian version 17.3. Hotfix A.

In logs we have the error below.

2018-02-07 23:18:18,736 [KomodoEventBus-5331] INFO  com.appian.komodo.engine.checkpoint.AutoCheckpointer - Queueing checkpoint for process-design because it has received over 30000 transactions since last checkpoint

...

2018-02-07 23:18:29,418 [KomodoEventBus-5369] INFO com.appian.komodo.log.KafkaLogCleaner - Updating committed offset for serviceManager.transaction.process-design-0 from 7842666 to 7902775
2018-02-07 23:18:29,426 [KomodoEventBus-5369] INFO com.appian.komodo.log.KafkaLogCleaner - Marking all transaction log entries for process-design that are at least 17 hours old for deletion.
2018-02-07 23:54:55,317 [Disk I/O process-design-179-1] ERROR com.appian.komodo.log.KafkaTransactionLogService - Exception appending KRequest{getSyncMode=SYNCHRONOUS, getSize=304, getDecodedKRequest=Optional[DecodedKRequest{apiVersion=9, writeCall=true, interfaceName=PROCESS, functionName=listenForMessages, transactionId=0}], content=ReadOnlyByteBuf(ridx: 304, widx: 304, cap: 304/304, unwrapped: CapacityTrackingByteBuf(SlicedAbstractByteBuf(ridx: 0, widx: 304, cap: 304/304, unwrapped: PooledSlicedByteBuf(ridx: 312, widx: 312, cap: 312/312, unwrapped: PooledUnsafeDirectByteBuf(ridx: 312, widx: 312, cap: 384)))))} with 3944508

java.util.concurrent.TimeoutException: Failed to send all messages within 71s

...

Suppressed: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for serviceManager.transaction.process-design-0 due to 15026 ms has passed since last append
Suppressed: org.apache.kafka.common.errors.TimeoutException: Expiring 1 record(s) for serviceManager.transaction.process-design-0 due to 15026 ms has passed since batch creation plus linger time
2018-02-07 23:54:55,326 [Disk I/O process-design-179-1] ERROR com.appian.komodo.translog.TransactionLogHandler - Unable to append transaction 3944508 to the transaction log for process-design
com.appian.komodo.translog.TransactionAppendException: Transaction 3944508 could not be appended to the transaction log for process-design

....

2018-02-07 23:55:32,595 [KomodoEventBus-5452] WARN com.appian.komodo.engine.connection.KProcessManager - Waited 10s for K engine process-design to stop. Continuing to wait for 20s more.
2018-02-07 23:55:42,702 [KomodoEventBus-5452] WARN com.appian.komodo.engine.connection.KProcessManager - Waited 20s for K engine process-design to stop. Continuing to wait for 10s more.
2018-02-07 23:55:52,829 [KomodoEventBus-5452] INFO com.appian.komodo.engine.connection.KProcessManager - Attempting to kill K engine process using pid 25449.
2018-02-07 23:55:52,870 [KomodoEventBus-5452] INFO com.appian.komodo.engine.connection.KProcessManager - K process-design stopped
2018-02-07 23:55:52,870 [KomodoEventBus-5452] INFO com.appian.komodo.engine.connection.NettyKConnection - Closing connection to K process-design
2018-02-07 23:55:52,873 [KomodoEventBus-5452] INFO com.appian.komodo.engine.connection.NettyKConnection - Connection to K process-design closed
2018-02-07 23:55:52,882 [KomodoEventBus-5452] INFO com.appian.komodo.engine.KEngineShutdownHandler - Posting EngineClosedEvent for process-design
2018-02-07 23:55:52,886 [KomodoEventBus-5442] INFO com.appian.komodo.engine.EngineComponentCloseable - process-design closed EngineClosedEvent{getCreatedInstant=2018-02-07T23:55:52.882Z, getEngineId=process-design, kProcessFailedToStop=false}, stop requested false

...

2018-02-07 23:55:56,010 [KomodoEventBus-5442] INFO com.appian.komodo.engine.EngineComponentService - process-design posting event EngineComponentStoppedEvent{getCreatedInstant=2018-02-07T23:55:56.010Z, getEngineId=process-design, forRestart=true}
2018-02-07 23:55:56,013 [KomodoEventBus-5439] INFO com.appian.komodo.engine.status.EngineStatusMonitor - process-design is now DOWN (due to EngineComponentStoppedEvent{getCreatedInstant=2018-02-07T23:55:56.010Z, getEngineId=process-design, forRestart=true})
2018-02-07 23:55:56,018 [KomodoEventBus-5455] INFO com.appian.komodo.engine.checkpoint.CheckpointWatcher - Stopping monitoring /home/appian_user/appian/services/data/temporary/process-design for checkpoint files.
2018-02-07 23:55:56,020 [KomodoEventBus-5442] INFO com.appian.komodo.engine.status.EngineStatusMonitor - process-design is now STARTING (due to StartEngineRequestEvent{createdInstant=2018-02-07T23:55:56.020Z, engineId=process-design})
2018-02-07 23:55:56,020 [KomodoEventBus-5455] INFO com.appian.komodo.engine.EngineComponentService - Building EngineComponent for process-design with context RestartContext{mode=REPLICA, epoch=null, restart=true, attempt=0}
2018-02-07 23:55:56,021 [KomodoEventBus-5455] INFO com.appian.komodo.engine.status.EngineStatusMonitor - process-design is now REPLICA
2018-02-07 23:55:57,106 [KomodoEventBus-5455] INFO com.appian.komodo.log.KafkaUtils - Finding active Kafka bootstrap servers.
2018-02-07 23:55:57,107 [KomodoEventBus-5455] WARN com.appian.komodo.log.KafkaUtils - No brokers are registered in Zookeeper, falling back to brokers listed in the topology.
2018-02-07 23:55:57,109 [KomodoEventBus-5455] INFO com.appian.komodo.log.KafkaUtils - Cannot connect to broker localhost:9092
2018-02-07 23:55:57,610 [KomodoEventBus-5455] WARN com.appian.komodo.log.KafkaUtils - No brokers are registered in Zookeeper, falling back to brokers listed in the topology.

 

Thanks in advance for your help.

 

Regards,

  Discussion posts and replies are publicly visible