Database Performance Best Practices

The document provides a list of best practices applicable to database query performance.

Performance Considerations

Reduce the Size of the Response

The more data the database needs to return the longer the query will take to execute and respond. Reducing the requested data to only what’s required is an effective method to improve response times.

The functions a!queryEntity() and a!queryRecordType() can be configured to return only the data that is needed using the selection parameter. Specify only the necessary columns in the selection parameter to reduce the size of the response. Use a!pagingInfo() with a specific batch size and keep it as small as possible. Avoid using -1 for batchSize to return all results from the query.

A common use case involves retrieving the primary key and name of an entity for displaying in a dropdown. This is best implemented using a!queryEntity() as it avoids retrieving unnecessary data which can slow the response.

Minimize the Number of Calls

The more calls that are made to a database, the longer the interfaces takes to respond for end-users, due to the network latency overhead and need to access the same table multiple separate times.

If the same data is required in multiple locations, it should be requested once upfront, for example using a local variable, and passed through to where it’s used rather than querying for it where it’s used each time.
Do not use looping functions to execute database queries. Refactor the query to retrieve all the data in a single call.
Merge multiple individual calls to the same entity into a single query that retrieves all that data in a single call.

Example:

/* bad */ 
a!localVariables( 
    local!ids: {1,2,3,4,5}, 
    a!forEach(
        items: local!ids, 
        expression: rule!APP_getEntityById(fv!index)
        )
    )
    
/* good */
a!localVariables(
    local!ids: {1,2,3,4,5}, 
    rule!APP_getEntityByIds(local!ids)
    )

It is also important to understand how nested CDT’s affect the number of calls made to a database when using one-to-many relationships. For example, consider a Company CDT with a nested multiple OfficeLocation CDT field. To return 5 companies, the system will execute 6 queries in total:

1 query to retrieve the 5 companies
1 query for each company to retrieve their office locations

This is more commonly known as the N+1 select problem and is expounded when using multiple levels of nesting. You can identify if you’re affected by this by enabling SQL logging and reviewing the number of queries that are executed. Use one or more of the following to reduce the number of calls:

Avoid using one-to-many annotations (nesting and multiples within CDTs).
Use a!queryEntity() to only retrieve the elements that are needed (particularly if it can avoid pulling back nested data).
- NOTE: during development, while the CDT is still being shaped with field edits and removals (not supported after deployment to production) a!queryEntity doesn’t get automatically updated via Check Outdated Dependents action. Any changes to fields in the CDT referenced by a!queryEntity will need to be manually applied on all a!queryEntity invocations that interact with the respective field. Where possible, try to create a reusable expression rule that invokes a!queryEntity to simplify maintenance.
If you can use synced record types in place of nested CDTs, use Record Type Relationships and query the Record Type and desired relationships using a!queryRecordType().

Let the Database Handle the Heavy Data-Centric Tasks

Databases are heavily optimized for manipulating data; don’t attempt to implement the same functionality in Appian that’s already available in the database.

Instead of pulling back all the database data into Appian and operating on it using Looping Functions, let the database handle as much of the data manipulation and transformation as possible. Actions such as sorting, aggregation, filtering, joining datasets, generating sequences and transforming data is ideally suited for a database. For example, when you need to present a sorted list of text values, make sure that the pagingInfo object contains a sort parameter. Otherwise, you will have to bring back the entire dataset into Appian, sort it, and then display the first page back to the user.

Within Appian this may involve mapping data store entities to a view or using a!queryEntity() to request an aggregated result set.

Use Indexes When Appropriate

In the context of performance, an index can help significantly improve data retrieval speed by avoiding slow table scans. However, indexes don’t come for free as they slow write operations.

The best column(s) to apply an index will depend on the size and shape of the data being retrieved. Typically they are added to improve the performance of sorting, aggregation, filtering and join operations.

Many databases come with tools available that can suggest where to add an index. It is important to thoroughly test both read and write based operations after adding an index to ensure performance in other areas is not negatively affected. See Query Optimization - Indexing for introductory advice. For even more on indices, see http://use-the-index-luke.com/

Consider using a synced record type to store data. Data is cached in Appian for record types that have data sync enabled and this enables data retrieval from a!queryRecordType() to be faster.

Considerations for Views

A view is a query stored on the database that can be used like a table. Views are usually used for hiding the complexity of the underlying data structure and simplifying data retrieval (such as allowing all required data to be retrieved with a single query).

Most often a view will be queried with additional filters by the application requesting a specific context. Depending on the database and abilities of the query optimizer this will often result in the view being fully executed before any additional filters, aggregation or sorting is applied.

As such, care must be taken to understand how the database handles this scenario and how the view will be used by the application. This also means that nested views can significantly worsen performance in addition to adding overhead for debugging. This does not mean that usage of views is entirely discouraged but that the team needs to ensure that views perform well at expected scale.

Troubleshooting view performance should be performed independently from Appian using the tools provided by the database vendor such as explain plans. The data store performance logs can be used to identify if the database is the cause by reviewing the Execute time. If a query exhibits a high percentage of its total time in the Execution phase, this indicates the database is the performance bottleneck.

When to Use Stored Procedures

A stored procedure is a subroutine that provides a mechanism for access control, processing and business logic to be stored within the database rather than within the application.

In the context of database performance, this allows complex and extensive processing that involves executing multiple consecutive queries to occur wholly within the database. This helps leverage operations the database is good at and simplifies the calling application logic.

A good example of this is a very large data set coming from another system that requires cleansing. The data would be loaded into staging tables, transformed and inserted into the destination tables.

Using stored procedures for all or most database operations instead of Appian OOTB datastore features is only recommended if no other OOTB features support the requirement. This is because stored procedures:

Do not allow the power of CDTs to be fully utilized within the application. Many core product features will be unavailable when using stored procedures.
Require additional expertise and domain knowledge to implement and maintain.
Add an additional dependency and operational overhead for performing any CRUD operations on the database.

Leverage stored procedures with the help of the "Execute Stored Procedure" Smart Service component, the a!executeStoredProcedureOnSave(), or the a!executeStoredProcedureForQuery() function. The a!executeStoredProcedureOnSave() function should be used to modify data based on a!saveInto parameter from an interface component or web API, and the a!executeStoredProcedureForQuery() should be used when running a procedure that will output data to be read and used in the application. The a!executeStoredProcedureForQuery() is not restricted to be used only in a saveInto parameter and should not be used for procedures that modify data. A guiding use case for a!executeStoredProcedureOnSave():

System X writes data represented by a comma delimited string to Table A
The string needs to parsed and individual data points inserted into Table B
No business logic is involved during the data manipulation

Operational Versus Reporting Data

The schema for operational and reporting data should not necessarily be the same nor held in the same database. They may be very different to handle the separate scaling or functional characteristics, although it's likely that the reporting data will be sourced at least in part from operational data.

Operational databases manage real-time transactional data. They need to be fast and responsive at high-volumes of data and concurrency. Operations on these types of databases are typically focused on a single record.

Reporting databases combine data from multiple sources to provide meaningful reports at an aggregate level such as a time period or region. They need to be optimized for read operations on large sets of data and will rarely perform write operations.

The volumes and historical storage requirements expected should be understood to define a plan that is capable of scaling to the required levels. A data warehouse may not be part of the initial implementation, but part of a future rollout plan. Use the data store performance logs to identify queries that are slow and track performance over time to minimize the impact on operational activity.

Some strategies to manage historical data and table growth:

Create Active vs. Historical tables
- You can utilize the same data type mapped to different data stores/data sources.
- Data can be moved manually or using process models running on a timer.
Purge data periodically
- Some organizations have strict data retention periods.
- This can be done manually or via a process model running on a timer.

Performance Troubleshooting

Involve the DBA early in the process of troubleshooting database performance issues.

Generally slow query performance that is worse or only occurring during peak usage is an indication the server is under-resourced.
Performance slowing due to an increase in data is an indication that suitable indexes may be missing. separation of reporting and operational data is required or the schema needs to be optimized to scale. When using synced records, consider adding date range source filters to restrict the volume of data to query from.
Generally slow transactional operations are an indication the schema is not designed to handle the operational activity.
Utilize Explain Statement on views to look for table scans.
Execute a SQL query against the database metadata tables (varies by vendor) to look for worst performing queries/views.
A passive, yet effective method is to archive (or delete, if applicable) data after a certain period of time so that the database is always procured/maintained and reduced to what is necessary/relevant for querying purposes.

Individual operations that are slow can be monitored and identified over time from the data store performance logs. These logs contain a breakdown of where the query spent most of its time:

Prepare: The amount of time spent generating the query for execution. This will increase the more complex the query is, such as a long list of filters or a nested entity.
Execute: The amount of time spent executing the query on the database. The investigation into the reason for the high execution time needs to be performed independently of Appian using the tools provided by the database vendor (such as query execution plans).
Transform: The amount of time spent processing the results. The larger the returned dataset the longer this will take. Use a!queryEntity() instead of query rules and target the exact data that’s needed.

Note: A data store entity that contains nesting will require multiple queries to complete. The times listed represent the total of all queries executed. The executed queries can be viewed by enabling SQL logging, however this can affect performance so should be disabled by default in production and also disabled when comparing performance.

Topics