Database Volume Testing is used to test and determine the amount of real-time data your database can handle efficiently without any performance issues. It is also an indicator of any scalability issues that may arise once the database application is deployed to production. It involves simulating real-life user load on the target database while taking into consideration key factors such as the number of concurrent users, database response times, etc.. which have been identified as part of non-functional requirements.
Why is it important?
Volume Testing databases prior to deploying them in production helps you design databases that are performant, reliable, and scalable. It also helps in determining if any performance tuning is needed based on the results of the testing.
Key Considerations
Choosing the Right Environment for Testing
When data volume testing your application, you will see the most accurate results by testing in a Prod-like environment, one that has the latest Production code and similar resources. Therefore, it is recommended to perform this testing in the most Prod-like environment available to you at least every 4-5 sprints, or more frequently on a shorter project.
Data volume testing can also be performed every sprint, or even for a single story, so that performance issues can be identified and resolved early. This will save the development team time and ensure that all necessary changes can be made before the end of the project. Because most teams are not able to volume test new development in a Prod-like environment every sprint, testing will need to be performed in a lower environment which contains all functionality developed in the current sprint. This can be a development or a test environment. To decide which environment to use, consider the following:
Recommended Approaches
If using Appian 19.4+, the recommended approach for loading application tables is using the Data Generation Application and Plugin. This application will query the tables for your applications and allow you to configure different data generators per column.
A potential approach to loading a predefined amount of data is to spin up processes in Appian that perform the data load. It is important that this is done off-hours so as to avoid increased load on the database during business hours. It is also recommended to archive these processes so as to not have an increased number of active processes.
Another approach is to write stored procedures that can be used to load a predefined amount of data in the database. A stored procedure is a set of SQL statements that will be run every time the stored procedure is called. As an example, we could write a stored procedure 'INSERT_EMPLOYEE' to populate an 'Employee' table as:
BEGININSERT INTO 'EMPLOYEE'('EMP_ID','EMP_NAME','EMP_TITLE','EMP_LOCATION','EMP_SALARY','EMP_MANAGER')VALUES('John Doe' AS 'EMP_NAME','Software Engineer' AS 'EMP_TITLE','Boston' AS 'EMP_LOCATION','$2000' AS 'EMP_SALARY','John Smith' AS 'EMP_MANAGER');END
We could create and call another stored procedure ‘POPULATE_EMPLOYEE’ that would call the ‘INSERT_EMPLOYEE’ stored procedure a specified number of times(10 in our example), and so would create 10 rows in the ‘EMPLOYEE’ table.
BEGINDECLARE counter INT(11) DEFAULT 1; -- counter for loopWHILE counter <= 10 DOCALL INSERT_EMPLOYEE();SET counter = counter + 1;END WHILE;END
BEGIN
DECLARE counter INT(11) DEFAULT 1; -- counter for loop
WHILE counter <= 10 DO
CALL INSERT_EMPLOYEE();
SET counter = counter + 1;
END WHILE;
END
It is possible to generate data for multiple tables as part of your use case using the SQL Stored Procedure approach.
In addition to the above approaches, commercial tools such as dbForge Data Generator can be used to perform volume testing for databases.
Testing the application
Once the data has been loaded, it is important to perform functional testing to identify performance issues, if any. Instructions and information on functional testing can be found here: Functional Testing. More specifically, run the processes, records, and reports to test for issues caused by the amount of data in the application. It is also recommended to run Health Check in order to identify any slow queries and/or processes. These queries can be further analyzed using SQL statements that help to understand the execution plan and identify less performant queries(example: EXPLAIN in MySQL, EXPLAIN PLAN in Oracle). For more information on analyzing slow queries and smart services, refer to Analyzing Performance Issues
System Resource Monitoring
When database volume testing, it is important to monitor the health of the Appian services and system resources in addition to measuring the performance of queries and other application operations. For example, inserting or querying against millions of rows may be performant on a system where no other operations are happening. But if your queries utilize 100% CPU and/or most of the memory, there won’t be room for other application functions. Or, your application won't be as performant when resources are used up by other applications. That is an example of system resource monitoring.
Synced Records
If you are using synced records then you will want to resync them after the database has been loaded. This is also a good opportunity to ensure any source filters are properly configured, in case the database volume exceeds the row limit of the record sync features.
Access the Database Load Generation Application via the AppMarket.
Consider how “perfect” the data needs to be. The goal of database volume testing is to validate application scalability. This can likely be achieved without meticulously setting 100% of fields with variety and integrity. For example, metadata fields such as “lastModifiedOn” may be skipped when configuring the generators if they won’t add value to the testing.
If you would like to delete all or a subset of the rows in a table prior to loading, use the “Deleter” and “Delete Settings” columns in the application view. In order to keep valid in flight data, do not fully truncate tables when regression testing will need to be performed in the future. Instead, consider the following technique to delete only certain rows: Allocate a range of IDs as test IDs and only delete those IDs (using the “Where Clause” or “SQL Query” Deleter), while leaving the existing transactional data as is.
Example:
Try to include a representative variety in your data. For example, in a case management system, 5 years down the road, you may have 95% of the cases in a closed status and 5% open at any given time. Ensure your generators are configured to create rows that most accurately reflect the state of the database in the future.
Many processes rely on data from external sources and you can’t always make the data perfect. For example, if a process model calls an integration and stores an ID from the response, you won’t have unique IDs for each row loaded using the generator because the integration wouldn’t actually be called. Use your best judgement on what data can be loaded to make the database volume testing most valid.
Q: What is “Random By Seed” in the Weighted List Generator?
A: Random seed specifies the start point when a computer generates a random number sequence. Use the same random seed to associate results from different weighted lists across fields.
Basic Weighted List
Q: What is a good use case for the SQL Query Generator?
A: The sql generator is commonly used when inserting reference data.
Q: Why won’t the time generator work?
A: Try adding +0 to ensure the formatting is correct.
Sql: SELECT CURRENT_TIME()
Sql: SELECT CURRENT_TIME()+0