You are currently reviewing an older revision of this page.

KB-XXXX How to understand checkpointing in service_manager.log

Purpose

Illustrate and explain the process of a single engine checkpointing, and how the associated output appears in the <APPIAN_HOME>/logs/service_manager.log.

Context

The Appian Engines use a process called "checkpointing" to persist data stored in the in-memory engines to disk. This is triggered either manually or automatically.

Instructions

Note: The remainder of the document will use the following notation:

  • <engine id>: This is the name of the engine associated with the checkpoint. E.g. groups, execution00.
    <path to temporary>: This is the temporary folder where files related to an in-progress checkpoint are stored. This is typically <APPIAN_HOME>/services/data/temporary.However, this can be changed in<APPIAN_HOME>/services/conf/service_manager.conf. Changing this file is not supported unless advised to by Appian Support.
  • <path to archived>: This is the archived folder where historical kdbs are stored.This is typically <APPIAN_HOME>/services/data/archived. However, this can be changed in <APPIAN_HOME>/services/conf/service_manager.conf.Changing this file is not supported unless advised to by Appian Support. This directly can be cleaned and maintained using the cleanupArchives script.
  • <path to engine gateway>: This is the directory that stores the live kdb that an engine will boot from if it is shutdown. This file should never be modified unless advised to by Appian Support. The exact path varies from engine to engine. E.g.<APPIAN_HOME>/server/process/exec/00 is associated with execution00.

Phase1:

When a checkpoint is first requested or triggered, it is placed into a queue within the Service Manager component. Checkpoints occur serially, in the orderthat they are placed into the queue. In the logs, this looks like:

2020-01-23 06:07:06,297 [KomodoEventBus-3139] INFO  com.appian.komodo.engine.status.CheckpointingStateProvider - Checkpoint <checkpoint id> requested for: <engine id>

Service Manager will also log output indicating the type of checkpoint that was requested, e.g. automatic or user triggered.

Automatic:

2020-01-2211:47:53,682 [CheckpointManager-<engine id>-0] INFO  com.appian.komodo.engine.checkpoint.CheckpointManager - HandlingCheckpointRequestedEventAutoCheckpointRequestedEvent{createdInstant=2020-01-22T11:47:53.681Z,engineId=<engine id>>,
 requestId=b6049c4c-af87-4594-88c2-354c1c6c632b,engineShuttingDown=false, checkpointContext=null}
 2020-01-2211:47:53,689 [CheckpointManager-groups-0] INFO com.appian.komodo.engine.checkpoint.CheckpointManager - QueueingCheckpointRequestedEventAutoCheckpointRequestedEvent{createdInstant=2020-01-22T11:47:53.681Z,engineId=<engine id>, requestId
=6c2e9683-7149-4003-b810-a4e9f075d792,engineShuttingDown=false,checkpointContext=CheckpointContext{checkpointId=6c2e9683-7149-4003-b810-a4e9f075d792,minTxId=259326, kdbFileIndex=436,retentionMetadata=serviceManager.transaction.<engine id>-0@51882
2,initiatingHost=dev194, executionHost=dev194}}

User Requested (using the checkpoint script):

2020-01-23 06:07:06,298[CheckpointManager-<engine id-0] INFO com.appian.komodo.engine.checkpoint.CheckpointManager - HandlingCheckpointRequestedEventUserCheckpointRequestedEvent{createdInstant=2020-01-23T06:07:06.296Z,engineId=<engine id>, requestId=b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc,engineShuttingDown=false, checkpointContext=null}
2020-01-23 06:07:06,307[CheckpointManager-<engine id-0] INFO com.appian.komodo.engine.checkpoint.CheckpointManager - QueueingCheckpointRequestedEventUserCheckpointRequestedEvent{createdInstant=2020-01-23T06:07:06.296Z,engineId=<engine id>, requestId=b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc,engineShuttingDown=false,checkpointContext=CheckpointContext{checkpointId=b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc,minTxId=316957, kdbFileIndex=437,retentionMetadata=serviceManager.transaction.<engine id>-0@656290,initiatingHost=dev194, executionHost=dev194}}

As can be seen, in both of the above outputs, the CheckpointRequestedEvent is handled by putting it on the checkpointing queue, to be processed.

Phase 2:

Service Manager will dequeue a checkpoint request in order to process it. This is associated with:

2020-01-23 06:07:06,307 [KomodoEventBus-3115]INFO  com.appian.komodo.engine.status.CheckpointingStateProvider -Checkpoint b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc dequeued for: <engineid>
2020-01-23 06:07:06,307 [KomodoEventBus-3142] INFO com.appian.komodo.engine.checkpoint.CheckpointWatcher - Waiting for checkpointb2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc for <engine id>
2020-01-2306:07:06,322 [KomodoEventBus-3115] INFO com.appian.komodo.engine.status.CheckpointingStateProvider - Checkpointb2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc started for <engine id>

After this output, Service Manager will be actively checkpointing the specific engine that the request was associated with. While an engine is being checkpointed, a temporary "writing" kdb is created in the temporary folder associated with the engine experiencing a checkpoint.

Phase 3:

Service Manager will complete the checkpoint operation and perform some operations on the file system:

  • The "writing" kdb from Phase 2 will be renamed to <kdb name>.written
  • The ".written" kdb will be moved into the gw1 folder associated with the engine that underwent a checkpoint. This is the new live kdb.

These appear in the logs as:

2020-01-2306:07:16,828 [checkpoint-watcher-<engine id>-172-1] INFO com.appian.komodo.engine.checkpoint.LocalCheckpointWatcherTask - Notified of Kimage created for checkpoint b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc file<path to temporary>/temporary/<engine id>/<engineid>-b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc.written at2020-01-23T06:07:16.828Z
2020-01-23 06:07:16,828[checkpoint-watcher-<engine id>-172-1] INFO com.appian.komodo.engine.checkpoint.SingleGatewayCheckpointCopier - Moving KCheckpoint to next live: <path to temporary>/temporary/<engineid>/<engine id>-b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc.written to/<APPIAN_HOME>/server/<path to engine gateway>/gw1/<kdbname>.kdb
2020-01-23 06:07:19,065 [checkpoint-watcher-<engineid>-172-1] INFO com.appian.komodo.engine.checkpoint.LocalCheckpointWatcherTask - Finishedcheckpoint b2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc for <engine id>
2020-01-2306:07:19,066 [KomodoEventBus-3136] INFO com.appian.komodo.engine.status.CheckpointingStateProvider - Checkpointb2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc completed for <engine id>
2020-01-2306:07:19,066 [CheckpointManager-<engine id>-0] INFO com.appian.komodo.engine.checkpoint.CheckpointManager - Checkpointb2d6c4d0-8be9-42fd-aeaa-4e35a63da3dc execution complete

Phase 4:

Once the checkpoint is complete Service Manager will perform cleanup of the prior "live" kdb, and will also instruct the Kafka component to mark anytransactions that were persisted to disk in the checkpoint as ready for cleanup. In the logs, this is associated with:

2020-01-23 06:07:19,067[checkpointArchiver-<path to archived>/archived/<engine id>-171-1]INFO  com.appian.komodo.engine.checkpoint.SingleGatewayCheckpointCopier -Archiving kdb image from <APPIAN_HOME>/server/<path to enginegateway>/gw1/pe436.kdb to <path to archived>/archived/<engineid>/pe436.kdb
...
2020-01-23 06:07:19,089 [KomodoEventBus-3128]INFO  com.appian.komodo.log.KafkaLogCleaner - Updating committed offsetfor serviceManager.transaction.<engine id>-0 from 630299 to 638385
2020-01-2306:07:19,091 [KomodoEventBus-3128] INFO com.appian.komodo.log.KafkaLogCleaner - Marking all transaction log entriesfor <engine id> that are an earlier offset than 638385 for deletion.

Affected Versions

This article applies to Appian 17.3 and later. The output was captured on Appian 19.4 and may be slightly different on older / newer versions of Appian.

Last Reviewed: January 2020