I have a question about the performance of data processing.
I have a query a list from record type A, and now I want to filter some data from the data I queried. I have two ways to do that, one is use query record type and add a filter to get the data I want, another is I use logic function to get the data from the list I queried. My question is, is there a big performance difference between these two solutions? If yes which way is preferred. If no how about CDT. Because when I code in Java, our strategy is better to query the database only once. But I'm not sure is this suit for Appian. But I think records type has cache like Redis, so even query the record type too much times, it will not cause performance problems. But I'm not sure if my thought is right?
Discussion posts and replies are publicly visible
I would recommend always using a!queryRecordType() with filters to fetch only needed data, as it pushes filtering to the database layer for optimized queries. Filtering large lists in memory is slower and not preferred. This matches your Java strategy: query the database efficiently once with proper filters. Use in-memory filtering only for small datasets.
Querying with filters applied is better so that the query returns just the data needed for the usecase. It reduces the 'extra' data being queried, which you might not need eventually, after further filtering anyway. Also, functions can slow down filtering while working with large dataset. So I suggest to use filters when querying large datasets.
If the query returns small dataset, then depending on reusability opportunities or various manipulations needed, functions like index(), property() etc are more suitable instead of multiple queries! Due to small data size performance and memory impacts risks are reduced.
Filtering in the DB is almost always the better option.
If you need to loop over your data in order to process it, particularly if it needs to be cross-referenced (like many Java map/reduce paradigms need to), you might be better off querying the data at once and then processing it. Just make sure you limit the queried fields to exactly what is needed for your scenario.
Here is a scenario I just optimized last week since it was causing a major issue in practice, at scale. The data model contains a complex relationship with several 1:many nested records; all of these records are sync'ed from an external system and Appian has no control over the external system's data model. This rule attempts to match identifiers from an Excel sheet uploaded by a user to the identifiers from those sync'ed records in Appian. The 'local!e1_oldMatch' variable was, effectively, multiple queries in a loop. The 'local!e1_newMatch' variable represents a single query with fn!filter in a loop (using a predicate on the data type). At first I attempted to use an index/wherecontains method for filtering, but that returned incorrect data because of the nested many:1 relationships involved.
local!e1_oldMatch
local!e1_newMatch
fn!filter
index/wherecontains
In this scenario, approximately 600 total elements are queried into memory, each with 1-6 nested many:1 relationships, and each of those had another nested many:1 relationship of 1-2 elements each. So to protect the heap, I made sure to only including mapping identifiers (integers) rather than 'every' field, in the query.
The caveat to this approach is that you will absolutely need to make sure you're not pulling too much data into Appian's Java heap. For example, if there is a query filter for for your dataset, make sure that a null value for that filter doesn't accidentally attempt to return the entire dataset.
This is a screenshot from the performance tab of a sandbox rule I made to compare the scenario's old way of data processing to the method that uses fn!filter. The term "e1" in the variable names represents just 1 row of the excel upload, so when a user uploads 20 rows there are substantial performance gains to be found when performing this mapping. The 3rd column is how many milliseconds it took to process each variable.
Over the years I have repeatedly found that brute-force processing data in loops (rather than using index/wherecontains or fn!filter) is a root cause of performance issues (both CPU and memory). Sometimes the loops cannot be avoided. But just like in Java, there are better ways to handle it. Another root cause is that casting data elements in Appian (in order to get around issues with soft-typing in SAIL for comparison purposes) is relatively slow when looped over.