Should I Perform Multiple Queries or Just One and Sift Through the Dataset?

Suppose I have a large table in the database with some number of columns, and Im interested in 4 of them: id, level1, level2, and level3. 

While id is a unique identifier for each row, so is the combination of level1, level2 and level3. 

I want to create three dropdowns:

The first one chooses from available level1 values.

Given the choice in the first dropdown, the second chooses from corresponding level2 values.

Given the choice in the first and second dropdown, the third chooses from corresponding level3 values, and saves the corresponding row id. 

Currently I am performing a new but smaller query every time a choice is made in any of the three dropwdowns. Is this the more efficient way? Or is there a nicer option where I do one large query of everything at the start, and slowly filter out using the information I get from each dropdown? If the latter is more efficient, what would be the best way to implement this?

  • It kinda depends on the expected size of the table in question.  If it's a lookup table where the size will never grow (or at least, never grow much), then it may be worthwhile (if slightly) to query the whole set and divide up the queried dataset on-form as it sounds like you're already thinking of.

    The downside here is, it seems like it might be quite a bit harder to maintain if there are ever any data or structural changes down the road - and additionally, unless the query is particularly expensive in terms of performance, I wouldn't expect this method to be necessarily all that much more efficient in total than doing smaller, column-limited queries at each level of user input.  Plus consider that when it's set up this way, with multiple smaller queries, the query load on the system is more spread out considering that when a user first loads the form, it just does the smaller initial query, and then subsequent queries are only done when (and if) the user drills down.

  • We had a similar problem, where some developer thought it was a good idea to gather the entire dataset and filter through it after the fact, though in our case not for any reason nearly as practical as what you have suggested.

    Your case is an attempt to avoid hitting the DB connection pool 3 times when you could do so only once.  That may be a great plan, but there may be a significant downside to doing it the other way.

    The PM in question, where we were gathering the entire dataset, was the absolute worst thing ever on our Appian health check, because each PM was storing basically the entire database table for 7 days before archiving.  We didn't notice until the records were hitting the several thousands and we were continually RAM spiking and having to restart the servers.

    The only problem I see with doing 3 queries is that you're running the risk of having a large part of the table, a subset of that, and a subset of that all stored in RAM.  Depending on how big your table and subsets get, this could be more crippling to your engines.  If a finite number of possibilities for level 1, level 2, and level 3 exist, you might try waiting until the last to run the query at all, perhaps using expression rules or Appian decision objects to filter down to avoid 3 queries.

 Discussion posts and replies are publicly visible