MySQL Cloud - Slow Database Queries with Large Datasets - Replacing queryEntity with Stored Procedure

We keep running into a timeout issue when paging queries on a large table. 

 

For example, we have single table with about 2,000,000 rows. When we query this table from Appian (via a queryEntity rule) it times out when we click the double arrows to view the data at the end of the gridField.

 

 

Error message from the logs:

18:33:37,968 INFO  [stdout] (ajp-/0.0.0.0:8009-40) Caused by: com.appiancorp.suiteapi.common.exceptions.AppianRuntimeException: com.appiancorp.suiteapi.common.exceptions.AppianException: Unexpected error executing query (type: [APPIANDATATYPE1234], query: [queryentity expression], order by: [[Sort[lastName asc]]], filters:[(status<> TypedValue[it=3,v=DL])]) (APNX-1-4164-028)
...

18:33:37,969 INFO [stdout] (ajp-/0.0.0.0:8009-40) Caused by: org.hibernate.exception.GenericJDBCException: could not execute query
...
18:33:37,969 INFO [stdout] (ajp-/0.0.0.0:8009-40) Caused by: com.mysql.jdbc.exceptions.MySQLTimeoutException: Statement cancelled due to timeout or client request
...



When we convert this to a SQL query (SELECT * FROM APPIANDATATYPE1234_TABLE ORDER BY LastName ASC LIMIT 1799975, 25) it takes about 40 seconds to complete. The cloud typically gives only 10 seconds before it times the query out, and is not configurable last time I asked.

The issue (and solution) in regards to the SQL Query length is posted on this stack overflow page: https://stackoverflow.com/questions/4481388/why-does-mysql-higher-limit-offset-slow-the-query-down

 Implementing this solution in a SQL query results in only take 0.78 seconds:

SELECT t.* FROM ( SELECT ID FROM APPIANDATATYPE1234_TABLE WHERE Status <> 'DL' ORDER BY LastName LIMIT 1799975, 25) q JOIN APPIANDATATYPE1234_TABLE t ON t.ID = q.ID

 

We've had success in the past utilizing stored procedures to avoid other timeout limits from MySQL.

 

I was planning on replacing the a!queryEntity rule with a fn!executestoredprocedure() to call a stored procedure that uses a prepared statement with inputs that would be sent from Appian (i.e. pagingInfo.startIndex for OFFSET, pagingInfo.batchSize for LIMIT, and using SQL_CALC_FOUND_ROWS to pass into an INOUT for the totalCount  and convert the results into a datasubset.

 

Are there any issues concerning Best Practices in this regard? Has anyone run into a similar issue and what was a solution you've found that works? Are there other solutions here that I'm missing?

 

I'd like some feedback before I go down the rabbit hole of re-vamping this queryEntity rule into a stored procedure. Also the tables have been indexed so that's not the issue here.

Thanks in advance!

  Discussion posts and replies are publicly visible

Parents
  • Have you tried replicating this on a local MySQL database and switching on the query log to see exactly what SQL queries are being run by Appian? A single query entity often actually consists of as much as 2 or 3 SQL queries, so your conversion to SQL almost certainly doesn't replicate exactly what Appian is doing - it may be even worse that the 40 seconds you've mentioned. If you knew what they were then you might be able to mitigate somehow using something like partitioning...? Still not sure if you'd be able to use query entity, but at least you'd know exactly what is running.

    One other thing to try is an EXPLAIN on (SELECT * FROM APPIANDATATYPE1234_TABLE ORDER BY LastName ASC LIMIT 1799975, 25) - is it actually using the index(es)? I'm assuming you've checked that but thought it's worth asking. If you regularly retrieve by last name, then it might be worth partitioning by last name - but it seems like you're sorting on last name rather than retrieving by. In fact, I think I'd put money on the last name sort being the problem rather than the limit... definitely worth an EXPLAIN!

    I can't think of a better way of implementing that specific workaround, but I'd say the only thing to watch out for is making data retrieval through stored procedures a "standard" - it should be an exception only. The OOTB (ie query entity) retrieval is fine 99% of the time, but definitely has some limitations for larger tables.
Reply
  • Have you tried replicating this on a local MySQL database and switching on the query log to see exactly what SQL queries are being run by Appian? A single query entity often actually consists of as much as 2 or 3 SQL queries, so your conversion to SQL almost certainly doesn't replicate exactly what Appian is doing - it may be even worse that the 40 seconds you've mentioned. If you knew what they were then you might be able to mitigate somehow using something like partitioning...? Still not sure if you'd be able to use query entity, but at least you'd know exactly what is running.

    One other thing to try is an EXPLAIN on (SELECT * FROM APPIANDATATYPE1234_TABLE ORDER BY LastName ASC LIMIT 1799975, 25) - is it actually using the index(es)? I'm assuming you've checked that but thought it's worth asking. If you regularly retrieve by last name, then it might be worth partitioning by last name - but it seems like you're sorting on last name rather than retrieving by. In fact, I think I'd put money on the last name sort being the problem rather than the limit... definitely worth an EXPLAIN!

    I can't think of a better way of implementing that specific workaround, but I'd say the only thing to watch out for is making data retrieval through stored procedures a "standard" - it should be an exception only. The OOTB (ie query entity) retrieval is fine 99% of the time, but definitely has some limitations for larger tables.
Children
No Data