Recently we have been noticing the following type of errors in the application-s

Todd Tran

Certified Associate Developer

over 9 years ago

Recently we have been noticing the following type of errors in the application-server.log file "Unable to acquire a Read connection..." and "Unable to acquire a Write connection...". This type of errors only appear in our Appian 7.3 PROD environment. The ones related to "evaluating expression" are more critical because they would stop the node execution and process flow, we need to manually restart the node to continue the process instance flow. Below is the error details:

Details: ERROR:An error occurred while evaluating expression: SigImagesDocument : if ( pv!apr_super_index > 0 , if ( isnull ( #"0000cd79-25d0-8000-73cd-0a000064044c" ( pv!apr_users [ pv!apr_super_index ] ) ) ,
append ( pv!SigImagesDocument , #"0000cd79-3ecd-8000-73cd-0a000064044c" ) , append ( pv!SigImagesDocument , #"0000cd79-25d0-8000-73cd-0a000064044c" ( pv!apr_users [ pv!apr_super_index ] ) ) ) , pv!SigImagesDocument )
(Expression evaluation error in rule 'getdocumentid'...

Discussion posts and replies are publicly visible

0 Eduardo Fuentes
Appian Employee
over 9 years ago

In addition to work with Support, now that you mention MNI seems to be contributing, are you running each subprocess one at a time or all at the same time? You can check the MNI settings in the node for more details.

Can you also attach /runtime_ear/suite.ear/conf/build.info to see what hotfix you are running?
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 ryanh over 9 years ago

Eduardo,

I wasn't sure if you were replying about MNI to me or Titran.

Windows 2008 servers running 7.11 hotfix D/E in Dev, B in production (upgrading this weekend to D or E).

The parent process pulls data, say it finds 532 new rows of data in 1 system (DB2). We loop over the 532 in sets of 100 (so 6 loops). Each loop increments a counter, calls a single subprocess synchronously passing in the 100 small sets of data. It now waits 10-20+ seconds after each 100 before repeating the loop.

The 1st subprocess makes a couple minor changes to data, writes it to the datastore (100 rows updated) and then uses MNI to spawn 100 subprocesses asynchronously (2nd level), 1 for each row of data. These are 'spawn all', 'run all instances at the same time - move on when all instances are done'.

In the 2nd subprocess (running asynchronously) we perform the 1-2 queries (to MS SQL) via query rules, make a few updates to the 1-10 CDT records returned and then write the updates back to the datastore.

My assumption is that the process of running 100 asynchronous subprocesses with multiple database queries/updates in each one could be causing the problem. Throw in the rapid succession of additional sets of 100 processes until everything is loaded and we are forcing our dual servers with 3 execution engines on each (6 total) to generate hundreds of subprocesses and potentially thousands of database calls in rapid succession. This happens about 10 times a day (total of maybe 3-5k db2 records processed) and sometimes 1-2 times a day a few of these hiccup.

All of the hiccups appear to be between our Windows based Appian servers and our MS SQL 2008 (upgrading to 2012 in production in the next few weeks) instance. We sometimes experience similar "Unable to acquire a Write connection..." (or Read connection) with other processes in different applications.

My hope is that by adding some pauses between each 100 subprocess loops as well as random pauses of 1-10 seconds between some of the queries in the lowest level process will functionally throttle the loading process and give the engines & the database time to work through their queues. Most of the queries take <1 second to run (based on the 'Process Nodes' duration in the dashboard, but when we see timeouts some of these queries take 2-5 seconds each with the total subprocess approaching or passing 10 seconds. My assumption is that the nodes are sitting in queues (in the engines or at the db level) too long and Appian is timing them out. We are able to restart the nodes later just fine.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Eduardo Fuentes
Appian Employee
over 9 years ago

Based on forum.appian.com/.../Hotfixes.html and the description of the issue, I think you should benefit from the improvements from Hotfix E, but I'll follow up in the open support ticket with you.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel
0 Eduardo Fuentes
Appian Employee
over 9 years ago

The new reference number is 424869.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel