Hello, I have noticed cases where executing CHECKENGINE displays some

Hello,

I have noticed cases where executing CHECKENGINE displays some engines with ERROR and a description of GW has restarted <#> times. However, the system is responsive and all automatic and manual checkpoints have succeeded going back 2-3 days.

What does the description "GW has restarted # times" mean exactly? What is its impact on the system?

Regards,
Thanos

OriginalPostID-146877

OriginalPostID-146877

  Discussion posts and replies are publicly visible

Parents
  • As mentioned above an ERROR is a messsage that indicates something went wrong (invalid transaction, couldn't append transaction,etc) that forced the engine to restart to recover from that state. I doesn't mean there's an actual issue at the moment, there was one but the engine was able to successfully recover from that state.

    ERROR messsages are cleared after an engine restart.

    To investigate the issue you do the following:

    1. Review the checkengine output to determine the time when the issue occurred. From your output this happened on:

    20 days 23 hrs ago (2015-04-07 17:26:38.048 GMT)

    for Process-Design

    2. You review the gw_*.log to confirm the restart happened and to see if you can determine the root cause. From your logs I do see the restart in this log but no root cause is explained there

    2015-04-07 17:26:39 [PD1] INFO .a.gw "State transition from [STANDALONE] to [DISCONNECTED]"
    2015-04-07 17:27:02 [PD1] INFO .a.gw.ssd "Connected to server"
    2015-04-07 17:27:02 [PD1] INFO .a.gw "State transition from [DISCONNECTED] to [ACTIVE JOIN]"
    2015-04-07 17:27:03 [PD1] INFO .a.gw.swj "No other gateways configured, switching to [STANDALONE]"
    2015-04-07 17:27:03 [PD1] INFO .a.gw "State transition from [ACTIVE JOIN] to [STANDALONE]"

    3. You then review the corresponding db_*.log to find out the root cause using the same time stamp that checkengine provided. From your log the root cause is:

    2015-04-07 17:26:37 [PD1] {pd606.kdb 16576} (Default) ERROR .a.s.loader "Failed to append transaction to DB ("pd606.kdb: The handle is invalid."), will delay one second and retry once."
    2015-04-07 17:26:38 [PD1] {pd606.kdb 16576} (Default) ERROR .a.s.loader "Retry failed: "pd606.kdb: The handle is invalid.""

    4. A message of type "The handle is invalid" causes a restart in the engine because it cannot write the transaction to disk because the *.kdb file was locked by an external source.

    5. To prevent the issue make sure your back-up tool, antivirus or any other scanning tool doesn't lock files that Appian needs. Nothing in the <APPIAN_HOME> directory can be locked at any time otherwise this error will occur.
Reply
  • As mentioned above an ERROR is a messsage that indicates something went wrong (invalid transaction, couldn't append transaction,etc) that forced the engine to restart to recover from that state. I doesn't mean there's an actual issue at the moment, there was one but the engine was able to successfully recover from that state.

    ERROR messsages are cleared after an engine restart.

    To investigate the issue you do the following:

    1. Review the checkengine output to determine the time when the issue occurred. From your output this happened on:

    20 days 23 hrs ago (2015-04-07 17:26:38.048 GMT)

    for Process-Design

    2. You review the gw_*.log to confirm the restart happened and to see if you can determine the root cause. From your logs I do see the restart in this log but no root cause is explained there

    2015-04-07 17:26:39 [PD1] INFO .a.gw "State transition from [STANDALONE] to [DISCONNECTED]"
    2015-04-07 17:27:02 [PD1] INFO .a.gw.ssd "Connected to server"
    2015-04-07 17:27:02 [PD1] INFO .a.gw "State transition from [DISCONNECTED] to [ACTIVE JOIN]"
    2015-04-07 17:27:03 [PD1] INFO .a.gw.swj "No other gateways configured, switching to [STANDALONE]"
    2015-04-07 17:27:03 [PD1] INFO .a.gw "State transition from [ACTIVE JOIN] to [STANDALONE]"

    3. You then review the corresponding db_*.log to find out the root cause using the same time stamp that checkengine provided. From your log the root cause is:

    2015-04-07 17:26:37 [PD1] {pd606.kdb 16576} (Default) ERROR .a.s.loader "Failed to append transaction to DB ("pd606.kdb: The handle is invalid."), will delay one second and retry once."
    2015-04-07 17:26:38 [PD1] {pd606.kdb 16576} (Default) ERROR .a.s.loader "Retry failed: "pd606.kdb: The handle is invalid.""

    4. A message of type "The handle is invalid" causes a restart in the engine because it cannot write the transaction to disk because the *.kdb file was locked by an external source.

    5. To prevent the issue make sure your back-up tool, antivirus or any other scanning tool doesn't lock files that Appian needs. Nothing in the <APPIAN_HOME> directory can be locked at any time otherwise this error will occur.
Children
No Data