Best Practices: Database Volume Testing

Database Volume Testing

Database Volume Testing is used to test and determine the amount of real-time data your database can handle efficiently without any performance issues. It is also an indicator of any scalability issues that may arise once the database application is deployed to production. It involves simulating real-life user load on the target database while taking into consideration key factors such as the number of concurrent users, database response times, etc.. which have been identified as part of non-functional requirements.

Why is it important?

Volume Testing databases prior to deploying them in production helps you design databases that are performant, reliable, and scalable. It also helps in determining if any performance tuning is needed based on the results of the testing.

Key Considerations

The "right" amount of data that needs to be populated in order to perform testing should be determined by the project sponsor and the stakeholders based on their estimation and should be documented in the release plan
It is also recommended to obtain information on non-functional requirements(if any) as early on in the project as possible so as to be able to design a truly efficient and scalable data model that can handle increased volumes of data
Testing with the right amount of data should be in the definition of done as part of each sprint. Changing the data model is difficult towards the end and we should identify risks associated with database scalability early on in the project.

Choosing the Right Environment for Testing

When data volume testing your application, you will see the most accurate results by testing in a Prod-like environment, one that has the latest Production code and similar resources. Therefore, it is recommended to perform this testing in the most Prod-like environment available to you at least every 4-5 sprints, or more frequently on a shorter project.

Data volume testing can also be performed every sprint, or even for a single story, so that performance issues can be identified and resolved early. This will save the development team time and ensure that all necessary changes can be made before the end of the project. Because most teams are not able to volume test new development in a Prod-like environment every sprint, testing will need to be performed in a lower environment which contains all functionality developed in the current sprint. This can be a development or a test environment. To decide which environment to use, consider the following:

Which environment has enough resources to allow for appropriate database volume testing?
- Depending on the resources allocated to lower environments, you may always need to load only a portion of the data you would normally load. Data volume testing in a lower environment is still valuable in this case, because even lower volumes of data can indicate where performance will be slow.
Which environment will be least impacted by risks introduced by data volume testing?
- If you have a Test environment where users are continuously testing and you cannot risk interrupting their testing, you may want to use a different environment for your data volume testing.
- If you have a Dev environment where many teams are developing concurrently and would be blocked by any issues caused by loading data into the system, you may want to load data in an environment with less traffic.

Recommended Approaches

Data Generation Application

If using Appian 19.4+, the recommended approach for loading application tables is using the Data Generation Application and Plugin. This application will query the tables for your applications and allow you to configure different data generators per column.

Process-based testing

A potential approach to loading a predefined amount of data is to spin up processes in Appian that perform the data load. It is important that this is done off-hours so as to avoid increased load on the database during business hours. It is also recommended to archive these processes so as to not have an increased number of active processes.

Using SQL stored procedures

Another approach is to write stored procedures that can be used to load a predefined amount of data in the database. A stored procedure is a set of SQL statements that will be run every time the stored procedure is called. As an example, we could write a stored procedure 'INSERT_EMPLOYEE' to populate an 'Employee' table as:

BEGININSERT INTO 'EMPLOYEE'('EMP_ID','EMP_NAME','EMP_TITLE','EMP_LOCATION','EMP_SALARY','EMP_MANAGER')VALUES('John Doe' AS 'EMP_NAME','Software Engineer' AS 'EMP_TITLE','Boston' AS 'EMP_LOCATION','$2000' AS 'EMP_SALARY','John Smith' AS 'EMP_MANAGER');END

We could create and call another stored procedure ‘POPULATE_EMPLOYEE’ that would call the ‘INSERT_EMPLOYEE’ stored procedure a specified number of times(10 in our example), and so would create 10 rows in the ‘EMPLOYEE’ table.

BEGINDECLARE counter INT(11) DEFAULT 1; -- counter for loopWHILE counter <= 10 DOCALL INSERT_EMPLOYEE();SET counter = counter + 1;END WHILE;END

It is possible to generate data for multiple tables as part of your use case using the SQL Stored Procedure approach.

Commercial tools

In addition to the above approaches, commercial tools such as dbForge Data Generator can be used to perform volume testing for databases.

Testing the application

Once the data has been loaded, it is important to perform functional testing to identify performance issues, if any. Instructions and information on functional testing can be found here: Functional Testing. More specifically, run the processes, records, and reports to test for issues caused by the amount of data in the application. It is also recommended to run Health Check in order to identify any slow queries and/or processes. These queries can be further analyzed using SQL statements that help to understand the execution plan and identify less performant queries(example: EXPLAIN in MySQL, EXPLAIN PLAN in Oracle). For more information on analyzing slow queries and smart services, refer to Analyzing Performance Issues

System Resource Monitoring

When database volume testing, it is important to monitor the health of the Appian services and system resources in addition to measuring the performance of queries and other application operations. For example, inserting or querying against millions of rows may be performant on a system where no other operations are happening. But if your queries utilize 100% CPU and/or most of the memory, there won’t be room for other application functions. Or, your application won't be as performant when resources are used up by other applications. That is an example of system resource monitoring.

Synced Records

If you are using synced records then you will want to resync them after the database has been loaded. This is also a good opportunity to ensure any source filters are properly configured, in case the database volume exceeds the row limit of the record sync features.

Database Load Generation App

Application

Access the Database Load Generation Application via the AppMarket.

Consider how “perfect” the data needs to be. The goal of database volume testing is to validate application scalability. This can likely be achieved without meticulously setting 100% of fields with variety and integrity. For example, metadata fields such as “lastModifiedOn” may be skipped when configuring the generators if they won’t add value to the testing.

Truncation

If you would like to delete all or a subset of the rows in a table prior to loading, use the “Deleter” and “Delete Settings” columns in the application view. In order to keep valid in flight data, do not fully truncate tables when regression testing will need to be performed in the future. Instead, consider the following technique to delete only certain rows: Allocate a range of IDs as test IDs and only delete those IDs (using the “Where Clause” or “SQL Query” Deleter), while leaving the existing transactional data as is.

Example:

Allocate IDs 1,000,000 through 1,999,999 as test IDs
Set the database to auto increment starting at 2,000,000 for any non database volume testing transactions moving forward. Setting the auto increment after the designated testing range allows the transactional data to continue to grow over time without worry of hitting the testing range.
Configure the Deleter using a SQL Query
When using the database load generator application, set the start index to 1,000,000 and ensure you do not go past 1999999 when loading rows.
Adjust the designated range as needed.

Data

Try to include a representative variety in your data. For example, in a case management system, 5 years down the road, you may have 95% of the cases in a closed status and 5% open at any given time. Ensure your generators are configured to create rows that most accurately reflect the state of the database in the future.

Many processes rely on data from external sources and you can’t always make the data perfect. For example, if a process model calls an integration and stores an ID from the response, you won’t have unique IDs for each row loaded using the generator because the integration wouldn’t actually be called. Use your best judgement on what data can be loaded to make the database volume testing most valid.

Generator FAQs & Examples

Q: What is “Random By Seed” in the Weighted List Generator?

A: Random seed specifies the start point when a computer generates a random number sequence. Use the same random seed to associate results from different weighted lists across fields.

Basic Weighted List