Monitor your Self-Managed Appian Environment

This guide provides information on monitoring your self-managed Appian environments only. All environments on Appian Cloud are fully monitored by the Appian Support team. Usage and resource metrics can be viewed on the My Appian site.

For monitoring applications deployed to Appian see Application Monitoring.

CPU

While engines occasionally peak at 100% CPU usage, this is not a sign of problems within the system. Sustained high CPU usage during unexpected times should be monitored, with exceptions for the time during startup, shutdown, or checkpointing (all three of these are CPU-intensive activities).

It is recommended to measure the average CPU utilization over the last 10-15 minutes rather than snapshots. This helps to smooth out short periods of high CPU usage and more accurately portray the demand over the period.

On Linux systems, this is better known as the load average. The load average may be larger than the number of available CPU cores, which indicates that work is being queued and additional resources are required. Windows systems do not natively provide a load average, so verify that your monitoring tool is capable of calculating it.

See also: Performance Monitoring Logs.

Engine Servers

RAM

Because the Appian Engines are in-memory processes, running out of physical RAM degrades performance. This may generally be addressed by either adding RAM or by splitting engine services across multiple servers, but an important characteristic to monitor is the overall RAM usage for a server as a whole. Any sustained utilization of 70% or more should be investigated. Each 5% increase in RAM usage over 70% should be monitored as well.

It is important to act on engine memory growth early as once memory is allocated it cannot be released. All process models should have explicit data management configured to delete or archive processes as soon as possible (recommended 7 days or less).

See also Managing Process Archives.

Disk Space

Frequent causes of running out of disk space include misconfiguration of the data management script (cleanup.bat or cleanup.sh) which should be scheduled to run nightly. Large or numerous log files can also use up large amounts of disk space and should be monitored and removed (also using the cleanup script). Any utilization of 50% or more of the available disk space should be investigated. 10% nightly growth in disk space usage should trigger an alert.

System Event Log Files

The important log files to monitor at the engine level are the rollback and write-to-disk logs; these indicate inconsistencies in data, which can cause the database to fail. The rollback procedure is triggered by a failed write transaction. The engine then logs an error and attempts to restart the engine.

  • Rollback logs are found in the <APPIAN_HOME>/logs directory, and follow the naming convention <APP_ID><GW_ID>_YYYYMMDDHHMMSS_rollback.l (such as PD1_20091008155649_rollback.l).
  • Transaction log errors indicate a problem with starting, restarting, or stopping an engine. The naming convention used for these logs is YYYYMMDDHHMMSS_transaction_log_error.log.

Scripts to detect of any of the aforementioned logs are generally specific to the site and monitoring tool. Upon detection of a rollback, transaction log error, or write-to-disk error log, the convert_l_to_text script, located in the <APPIAN_HOME>/server/_scripts/diagnostic directory should be run, and a support case should be opened with the converted file attached in order to help investigate the error.

Write to Disk Errors

The generation of a write-to-disk log file indicates that a write-transaction failed; the in-memory transaction may have succeeded, but the corresponding write of the transaction to the .KDB file on disk could not be completed. For such errors, a log file is generated that matches the regular expression .*_[0-9]{14}_write_to_disk_failure\.l, where the first wildcard .* represents the engine with the failure in the form <APP_ID><GW_ID> and the 14 digits represent the date and time in GMT in the form YYYYMMDDHHMMSS. For example, PA00021_20130925184917_write_to_disk_failure.l.

The most common reasons for this type of failure are:

  • The server does not have enough disk space: When your server is low on disk space, this error typically occurs during engine-server checkpointing. Make sure that you have properly configured your cleanup job. If there is no disk space to write new transactions to a .KDB file, the engine-server gateway continuously restarts.

  • Engine server files are locked by another process: Any process running on the host server (such as virus-scanning software or a back-up job) that temporarily locks an engine-server's .KDB file on the file system can prevent updates from being written to the file.

  • Insufficient permissions: The user accounts that are running the application must have sufficient file-system authority to write to the directories where the engine-server files are located.

Write-to-Disk Error as Seen by the Status Script

The <APPIAN_HOME>/services/bin/status.sh(.bat) script lists an ERROR status instead of an Okay Status when such an error is encountered.

For example:

Process-Exec2   X/X   ERROR   Gateway restart X time

Appian Process Count

On a standard single-server installation of Appian, fifteen k.exe processes are started, corresponding to the fifteen data services. Additional processes are added for each additional paired execution engine and analytics engine you configure.

Monitoring systems should verify that all Appian processes are running. This verification should run every thirty minutes to allow for planned maintenance (restarting services reduces the number of processes, while checkpointing engines causes the number to increase); if the number is incorrect for two consecutive tests, this indicates a failure that should be investigated.

When running Appian as Windows services, an additional Java process named appiansvc.exe is listed in the Task Manager.

If the application server or search server are hosted on the same machine, a Java process will appear for each. We recommend monitoring these processes as well.

Manual Checkup

In addition to the system monitoring protocol described above, Appian provides maintenance scripts, such as the status.sh (.bat).

  • If checkpointing fails on any engine, it is important to contact Appian support immediately, and not to shut down the engine servers.

See also: Maintenance Scripts.

Application Servers

Java Heap Space

Working memory allocated to the application server is known as the Java Heap Space. This is where the application server utilizes most of its memory so it’s important that it’s large enough to handle the user and system load. A heap space that is too small will result in performance problems.

The heap space should be large enough that it does not need to exceed 70% utilization during a typical working day. This should take into account peak hours and end of period operations.

Common causes for high heap utilization are:

  • Memory intensive plug-ins
  • Memory leaks
  • Running large portal reports
  • Application import/export
  • Large database queries

Disk Space

Other than local caches and installation files the disk space used by application servers typically won't fluctuate much. Most of the disk space managed by application servers is from Appian’s application data which represent documents uploaded by users or generated by processes. Processes should make sure documents are explicitly cleaned up when they’re no longer needed.

See also Delete Document Smart Service.

Application Server Log Files

A high frequency of ERROR messages in the logs may indicate problems in the deployment. These may be due to improperly configured Process Models, failure to connect to external resources, invalid data entry, or one of a number other issues that can impact stability.

Based on the system-monitoring tools available, you might need to create a custom script to check the log files.

The default location for these log files is in <APPIAN_HOME>/log/*.log. If running JBoss, the default logs for the service are located in <APP_SERVER_HOME>/server/default/log/boot.log and ./server.log.

Search Servers

Search Server Metrics Log

The Search Server Metrics log file (<APPIAN_HOME>/log/data-metrics/search_server.csv) provides information on the search server component of the Appian architecture.

There are several metrics in that log file that are especially pertinent to Appian server administrators and may require corrective action:

  1. Add more disk space if Store Disk Size approaches the amount of disk space available to disk partition mounting the search server data directory (<APPIAN_HOME>/_admin/search-local/search-server/data).
  2. Upgrade to faster disk drives (ideally SSDs) if Disk Throttling Time is increasing between log writes (5 minute intervals). This is an indication of slow disk I/0. A faster disk drive is required to keep up with the volume of data the site is producing.
  3. Increase the memory allocation to the JVM running the search server if
    • Field Data Evictions or Field Data Breaker Triggers is greater than 0.
    • Used Heap Percentage is approaching 100.
    • The sum of Filter Cache Memory Size (MB), ID Cache Memory Size (MB), Field Data Memory Size (MB), and Segment Memory Size (MB) make up a large percentage of the Used Heap Space (MB).

See also:

Web Servers

Web Server Log Files

High frequency of 404 or 503 error codes (amongst other HTTP error codes) in the access log, non-startup/shutdown errors in the error log, or errors in the mod_jk log may indicate problems in the deployment. These may be due to misconfiguration of the product, failure to connect to resources, or other issues regarding stability. Detecting this type of problem may require the creation of a custom script or use of a log analyzer. The default location for Apache 2.0 log files is in <APACHE_HOME>/logs/*.log, but may vary according to the web server used.

Performance Monitoring Log Files

Server Performance Logging: The performance of each Appian engine and application server is automatically monitored and logged in comma separated value (CSV) files, to simplify spreadsheet analysis. See also Performance Monitoring Log Files.

We recommend using the Appian Health Check at least once a month in production to monitor these logs and identify potential risks.

See also: Performance Monitoring Log Files.

Monitoring Tools

There are many monitoring tools available, each with their own features and limitations. The following is a list of tools that have capabilities to monitor different parts of the Appian software stack. This is not an exhaustive list and should only be considered as a starting point in understanding the options available to monitor Appian.

We recommend reviewing the monitoring tools already available internally and performing your own research using the Server Monitoring Checklist to determine if it has all the capabilities to monitor your Appian environments.

Important notes:

  • End user experience monitoring tools such as Dynatrace User Experience Management must not be used with Appian. These types of monitoring tools inject JavaScript code into the server responses that can cause some of the Appian features to not perform correctly. 
  • Appian recommends against the use of bytecode analysis tools for monitoring and analyzing Appian. The Appian Java code is already tuned and profiled before each release and monitoring a live instance may cause performance problems. Use of such tools should be limited to custom plug-ins in development and during performance testing, but not in production.

 Appian Cloud customers should refer to Monitoring Cloud.

Server Monitoring Checklist

Ensure you're fully prepared to monitor your server's performance by downloading our comprehensive Server Monitoring Checklist. This checklist covers everything from CPU utilization to external component monitoring, making it an indispensable resource for system administrators and project managers alike.

PDF