Writing to Synced Record Types at High Throughput

Introduction

Appian's data fabric allows you to unify, secure, and optimize data spread across the enterprise. Typically, writing and syncing data in record types is as easy as using one of Appian’s smart services like Write Records, Write to Data Store Entity, and Write to Multiple Data Store Entities. However, when designing autoscaled processes, you may want to avoid using these smart services directly. Synced record types are limited to handling ~1,000 write requests per minute for processes writing data in small batches (1-10 rows).

Trying to sync data above this level of throughput will fail and can cause record type invalidations. When using the Write Record smart service, writes will be throttled when exceeding a certain limit. This means that the nodes will fail and must be manually restarted, but the synced record type will remain available. When using Write to Data Store, failed writes will cause an invalidation, bringing down the impacted record type until a manual full sync is performed.

This guide explains how to design a process that supports writes to data fabric when peak times require more than ~1,000 write requests per minute across all of the applications on your site. This is accomplished through an incremental sync pattern where data is synced into the record types independently from the writes to the source table in large batches enabling up to 200,000 rows to be synced per minute.

Example Process

Let’s say we have a process model with Autoscale enabled, which receives data over an API call, performs some transformations on the data, and then writes a single record to a synced record type.

Typically, the API is called only 200 to 300 times per minute. However, at peak times, the API is called over 10,000 times in a single minute. At these peak times, the Write Records node fails due to data sync limitations. Upon reaching data sync write throughput limitations, write performance will degrade, leading to request timeouts. Failed requests will return either 'could not obtain a reservation within the time limit' or 'the request has timed out due to the system being busy.'

Deferred Sync Pattern

In order to allow the process model to continue to process data even in times of peak load, we need to implement an independent sync pattern where data is synced in batches rather than one row at a time.

To avoid automatically syncing data to data fabric from the process model, an independent data store or a connected system data source must be created and used in the autoscaled process model to write data. This will allow us to write data only to the source table at first, and sync it in the record type independently.

Configure a New Data Source, Data Type, and Data Store

Create a new Data Source Connected System that connects to the same database as your existing record type.
Create a CDT from the table that already backs the synced record type.
Create a new data store with that CDT as an entity. Use the connected system data source you just created as the data source.

If your record type does not already have a modifiedOn timestamp, you will need to add that to your data structure, and ensure it is being updated every time the row is updated.

Replace the Write Records Node

In the autoscaled process model:

Update the process variable that holds the data being written to use a CDT instead of a record type (the one we just created). Update all usage of this variable.
Replace the Write Records node with a Write to Data Store Entity node, using a constant for the new data store we just created.
Click Save & Publish.

Now, your autoscaled process model will only write to the source table instead of automatically syncing data in the record type.

Sync Batches of Data Every 5 Minutes

After re-configuring the autoscaled model, create a separate process model running without autoscale enabled to incrementally sync data the record type from the source table.

To configure incremental sync:

Create a new process model without autoscale enabled.
Create process variables to track the timestamp at which data was last synced (lastSyncTime) and the list of IDs to sync (recordsToSync).
Set the default value of the lastSyncTime variable to now()-5. This will cause the first run of the process to sync anything updated after 5 minutes ago.
Configure a timer event on the start node to run the process every 5 minutes.

In a new expression rule, query the IDs of any rows modified after the lastSyncTime timestamp, using a query that looks like this:

a!queryEntity(
  entity: cons!JLT_RecordEntity,
  query: a!query(
    selection: a!querySelection(
      columns: {
        a!queryColumn(
          field: "id"
        )
      }
    ),
    logicalExpression: a!queryLogicalExpression(
      operator: "AND",
      filters: {
        a!queryFilter(
          field: "modifiedOn",
          operator: ">",
          value: ri!lastSyncTime
        )
      },
      ignoreFiltersWithEmptyValues: true
    ),
    pagingInfo: a!pagingInfo(
      startIndex: 1,
      batchSize: 50,
      sort: a!sortInfo(
        field: "modifiedOn",
        ascending: true
      )
    )
  ),
  fetchTotalCount: false
)

Add a Script Task to the process and call the expression rule created above, passing in lastSyncTime.
Add a Sync Records smart service node to sync the IDs you just queried in the record type. Since this will most likely require syncing more than 1000 rows at a time, you will need to use a loop to iterate over all rows that need to be synced.

That’s it! This pattern of ingesting data in larger batches can support syncing up to 200,000 rows per minute in record types across all applications on your site, with data being updated in 5 minutes or less.

Topics