A Case Study in Data Modeling for an Appian Application

Certain Appian applications are a mix of transaction heavy OLTP (Online Transaction Processing) - with a high volume of insert, update and delete transactions and read-heavy or CPU intensive OLAP (Online Analytical Processing) with reporting and analytical capabilities. The database schema needs to be designed to cater reasonably well to both design paradigms. This document is intended to assist the lead Appian application designer, by demonstrating a sample Appian application’s database design and explaining the thought behind each design decision taken. This document is also meant to complement the information provided in the Database Performance Best Practices playbook play which highlights the best practices for a performant database.

Consider an application called Quote to Order for this case study, whose Entity Relationship diagram is shown above. This app is used by the Acme sales team to track sales opportunities. New accounts for each prospective customer are created in the app by an Acme Account Executive. When a new sales opportunity is identified for an account, a quote is prepared and presented to the customer. Any available offer prices are included in the quote. If the customer accepts the quote, an order is generated. OLTP and OLAP design considerations for this application are explained below.

Creating new schema

It may be beneficial to create new database schema in order to separate data within a single instance. We recommend creating schema only if users require segregated database access. Too many schemas could contribute to performance degradation due to contention over pooled database connections. Users should access any shared data through a shared schema. Refer to Data Migration to New Schema for information on creating additional schemas.    

Write-heavy OLTP Design:

Data Modeling:

  • Identify the main entities: Account, Opportunity, Quote, Order, and Offering. Define tables for each of the main entities with their attributes. Some of these are good candidates for an Appian record. In this application, Account was chosen as an Appian record.
  • Split one-to-many relationships into separate entities. For example, a single Order can be composed of several individual order items. So they are stored separately in the OrderLineItems table.
  • Ensure that there is no redundancy in data storage. 

In this example, the listPrice is stored in both the Quote Line Item table and the Order Line Item table. This might seem like a redundancy but in this case it is required to make a distinction between the listPrice when the quote is generated and when the order is created.

Reference Table:

It is also good practice to have a master reference table to store data that is used to populate dropdowns from which the user needs to pick from a set of predefined values. If a new value needs to be added to the reference list in the future, it's just a quick database change.

In the Quote to Order application, the stages of Opportunity can only be one of the following: Prospecting, Needs Analysis, Value Proposition, Price Quote, Negotiation. These values can be stored in the Reference table QTOT_REFERENCE and the primary key REF_ID is stored as the foreign key in the QTOT_OPPORTUNITY table where it is referenced.

Normalization:

  • Designing a data model to conform to the rules of database normalization ensures that there is no data redundancy and data integrity is maintained.
  • For a transaction heavy system, ensure that all the relationships within each table are in the Third Normal Form (3NF) (all of the columns in a table depend only on the primary key column(s)).
  • A normalized database design improves the speed of transactions, however, over-normalized tables can decrease the efficiency of queries.
  • For example, if you need to store addresses, storing postal code and city in the same table called Address is not strictly in the 3NF, as the city column is dependent on the postal code as well as the primary key. So there is a chance the postal code could get updated and the city still has the older value. To bring this to the 3NF, postal code and city should be stored in a separate table. Realistically, this will not be required as most real-world applications will use an external address validation (like the USPS address validation service) before persisting addresses to the database. The tradeoff is to have the Address table remain in the 2NF.
  • As a result of normalization, data is distributed across several tables to maintain data integrity. To retrieve all the related data for an entity, several join statements will be required. For example, to retrieve all the details of an Order to display a summary of the Order, we would need to fetch data from Order, Account, Order Line Items, Price Quote, Quote Line Item and possibly even Offering and Offering Price tables. This is a trade off of normalization as it increases the complexity of queries and makes read operations slower. Database Views can be created to simplify queries as Appian Records can be built from views as well. Nested CDTs is another option to join related data in Appian considering these best practices.

Indexing:

  • To determine whether indexes are required, or which indexes to create, make a list of the queries that will be run and their frequency. An estimate of the frequently run queries can be made by considering the core use cases of the application. If you’re looking to tune your database performance, a good place to start is the Appian health check, which provides data on slow performing queries.
  • For example, in the Quote to Order application, an account executive who is assigned many accounts might want to filter by account name to view and update the status of one or more accounts on a daily basis. The below query fetches the account information filtered by the account name and is determined to be the most frequently run query in the application.

SELECT 
	a.accountName, a.accountExec, o.oppName, o.oppStage
FROM
	QTO_account a
INNER JOIN
	QTO_opportunity o 
WHERE 
	a.accountId = o.accountId AND a.accountName = "Fictional Corp";

  • This query can benefit from an index on the accountName column in the account table. The best candidates for an index are the column names in the WHERE clause. In this query, the accountId (the primary key) is already indexed, so to further improve the performance of this query, an index can be created on the accountName column.

CREATE INDEX IDX_QTO_account_accountName ON QTO_account(accountName);

  • When choosing your index it is important to weigh the impact of update statements. Indexes have a positive impact on the performance of queries, however update statements could either be slowed down or sped up by an index.
  • For this index, we have to consider that each time there is an update transaction to update the name of an account or insert a new account or delete an account, the index on the accountName column will be rebuilt.
  • It is estimated that the usage of the query to fetch details of an account is greater than that of the updates to the name of an account. The benefit from speeding up the query is larger than the efficiency lost in rebuilding indexes after each update transaction.

Also refer to An Introduction on Query Optimization for a deeper dive into query optimization.

Stored procedures:

  • Stored procedures should be used when a set of operations need to be performed repeatedly. An Appian process model can be used to schedule the run of a stored procedure. The best use case for stored procedure updates is one that is not time critical.
  • For example, in the Quote to Order application, for every account with no active opportunities or if an active opportunity remains in oppStage = ”Prospecting” for more than 2 months, the account needs to be set to “Inactive” status. This use case is a good candidate for a stored procedure. The Appian process can schedule the stored procedure to perform this update every night, and then send emails to the account executive to alert them of the change in status.
  • It's important to consider the best practices for database performance when choosing a use case for a stored procedure, as mentioned here, and to discuss with your DBA to decide what would be best for optimal application performance.
  • Note: To call a stored procedure from the Appian process model, use the Execute Stored Procedure smart service (native to Appian 22.1 or later, or available via plugin for older versions).

CREATE PROCEDURE QTO_updateAccountStatus (IN inactiveDaysThreshold INT)

-- Begin
   UPDATE
        QTO_account AS account
   INNER JOIN QTO_opportunity AS opportunity
   ON account.accountId = opportunity.accountId
   SET
        accountStatus = "Inactive"
   WHERE
        opportunity.oppStage = "Prospecting"

   AND DATEDIFF(NOW(), lastUpdated) > inactiveDaysThreshold;
-- End

Triggers:

Triggers should be used when there is a need to track the changes in values of attributes. This is especially useful for a historical audit.

For example: In the Quote to Order application, a trigger can be specified to run on update of the Account table and store the values of accId, accStatus, and time of update to an Audit table.

CREATE TRIGGER TRG_AFTER_QTO_accountStatusUpdate
AFTER UPDATE ON QTO_account
FOR EACH ROW BEGIN
IF (NEW.accountStatus <> OLD.accountStatus)
THEN
INSERT INTO QTO_accountAudit
(accountId, lastUpdated, oldValue, newValue)
VALUES
(NEW.accountId, NOW(), OLD.accountStatus, NEW.accountStatus);
END IF;
END;

Read-Heavy OLAP design:

  • When the data being reported on does not change frequently, such as historical data, it might make sense to store this data separately from the tables that are transaction-heavy and have frequent updates. This way, if a complex query (maybe one that includes a join, a subquery, and an aggregation) needs to be run to retrieve the data that the report needs, the query can be run at regular intervals during non-business hours to refresh the reporting table data. This is a good use case for a stored procedure scheduled by an Appian process. This approach could also be considered for slow-performing queries identified in database performance testing.
  • For example in the Quote to Order application, transaction-heavy OLTP tables are prefixed with QTOT (Quote To Order Transactions) and the read-heavy and CPU-intensive reporting tables have a prefix QTOA (Quote To Order Analytics). A good use case for this approach in our example application is reporting on the Orders that were completed in the previous 3 years. This data can be fetched and updated by a stored procedure that is scheduled to run every night by an Appian process and stored in the table QTOA_HISTORICAL_ORDERS, which is independent from the write-heavy transaction tables.
  • Querying and storing data separately, as we have done with historical orders in this case study, results in a denormalized database design that allows for fast retrieval of data and better performance as numerous join statements are not required every time this data needs to be read and displayed.
  • For frequently run queries that are relatively straight forward (e.g., fetch all orders by a customer) a materialized View could be considered for ease of querying. However for a more complex query that uses a lot of JOIN statements (e.g., fetch the last comment on all high priority cases from the previous quarter) using a stored procedure would be more performant.

Stored procedure for QTOA_HISTORICAL_ORDERS:

CREATE PROCEDURE QTO_updateHistoricalOrders()
-- Begin
INSERT INTO QTOA_HISTORICAL_ORDERS (
   orderId,
   totalQuoteListPrice,
   totalOrderListPrice,
   totalOrderActualPrice,
   orderDate,
   accountId,
   accountName,
   accountExecutiveId,
   accountExecutiveName
   )
   SELECT
      QTO_order.orderId,
      SUM(QTO_quoteLineItem.listPrice),
      SUM(QTO_orderLineItem.listPrice),
      SUM(QTO_orderLineItem.actualPrice),
      QTO_order.orderEffectiveDate,
      QTO_account.accountId,
      QTO_account.accountName,
      QTO_account.accountName,
	  QTO_accountExecutive.accountExecutiveId,
	  CONCAT(QTO_accountExecutive.firstName, " ", QTO_accountExecutive.lastName)
	FROM QTO_order
	JOIN QTO_orderLineItem ON QTO_order.orderId = QTO_orderLineItem.orderId
	JOIN QTO_priceQuote ON QTO_order.priceQuoteId = QTO_priceQuote.priceQuoteId
	JOIN QTO_quoteLineItem ON QTO_priceQuote.priceQuoteId = QTO_quoteLineItem.priceQuoteId
	JOIN QTO_opportunity ON QTO_priceQuote.opportunityId = QTO_opportunity.opportunityId
	JOIN QTO_account ON QTO_opportunity.accountId = QTO_account.accountId
	JOIN QTO_accountExecutive ON QTO_account.accountExecutiveId = QTO_accountExecutive.accountExecutiveId
	WHERE
		DATEDIFF(NOW(), QTO_order.lastUpdated) = 1
-- Update only the previous day's changes
	GROUP BY
		QTO_order.orderId;
--Delete orders older than 3 years
DELETE FROM QTOA_HISTORICAL_ORDERS 
WHERE 
	DATEDIFF(NOW(), QTOA_HISTORICAL_ORDERS.orderDate) > 1094;
-- End

  • Several other valuable reports could be built off of this table. For example:
    • A breakdown of Orders by each quarter.
    • A comparison of the number of Orders for each quarter in the past 3 years.
  • Indexes can also be applied to the columns of the reporting tables without having to take into consideration any impact to the performance of update transactions.