Better Approach for 1M+ Record Updates: Appian Write Records vs Stored Procedure

Certified Lead Developer

We have an external event that can impact up to 1M+ Case records. For each affected Case, we need to:

  • Update 3 fields in the Case table.
  • Create a corresponding Event History record.

We're evaluating the following approaches:

Option 1 – Appian Write Records

Approach A

  • Query Cases in batches of 5000
  • Use MNI to process each Case in a single Write Records node to update the Case and create the corresponding Event History record
  • Loop until all affected Cases are processed

Approach B

  • Query Cases in batches of 5000
  • Update all affected Cases for the batch in a Script Task
  • Construct the corresponding 5,000 Event History records
  • Use one Write Records node to bulk update the 5000 Case records
  • Use a second Write Records node to bulk insert the 5000 Event History records
  • Loop until all affected Cases are processed

Option 2 – Stored Procedure

  • Use a stored procedure to bulk update Case records and bulk insert Event History records
  • Persist affected Case IDs and Event History IDs in staging tables
  • Query the staged IDs and performs Sync Records in batches of 1000 (due to Sync Records Smart Service limits until all ids are processed)
  • Loop until all IDs are synced

Question

For workloads that can exceed 1 million records, which approach would you recommend and why?

While Option 1 benefits from automatic record synchronization, it may place significant load on Appian process execution and engines at this scale. Option 2 shifts the bulk processing to the database but requires additional ID tracking and explicit record synchronization.

Has anyone implemented a similar solution at this volume? Would you recommend Option 1A, Option 1B, Option 2, or another approach altogether for the best balance of performance, scalability, and Appian platform load?


  Discussion posts and replies are publicly visible

Parents
  • Before ranking 1A/1B/2, the first question is whether this belongs in Appian at all.

    Is this even an Appian job? Appian isn't an ETL engine. If those 1M+ rows live in a database Appian doesn't exclusively own, the cheapest and safest bulk mutation is usually a native DB job or the customer's existing ETL tooling, and then you let Appian sync the result instead of performing the write. Keep Appian as orchestrator and record consumer, not bulk data processor. Worth confirming there isn't already a more cost-effective tool in the estate before pushing a seven-figure operation through process execution.

    Rule out 1A first, it's the dangerous one. It queries 5k per batch but then uses MNI to write one record at a time. Across the full job that's up to ~1M process instances, each holding its Case payload plus the paired Event History record as resident process state. You stack two failure modes: a large memory footprint and saturation of the execution engine queues not even easy to stop if any issue arises. At this volume 1A is the fastest way to take an environment down. Avoid it outright.

    That leaves 1B and 2, and they differ on one axis: stay in-platform with automatic sync (1B), or offload the heavy DML to the database (2). Here's how I'd rank them for your scenario:

    2, stored procedure. > Best default for scale and platform load. The procedure runs the set-based UPDATE and INSERT directly and returns only an array of affected IDs. Set-based DML is far faster, simpler to recover, and keeps the process memory-light. Two things to get right: don't pass full record arrays into the procedure (that reintroduces 1A's memory problem, let the DB own the data and hand back only IDs), and because stored-proc changes bypass automatic sync you must run an explicit Sync Records loop. That loop caps at 1,000 IDs per iteration and Sync Records is meant for low-throughput use, so loop it per batch rather than as one monolithic pass: https://docs.appian.com/suite/help/26.6/Sync_Records_Smart_Service.html#syncing-more-than-1000-records

    1B, bulk Write Records > Acceptable fully-in-platform fallback if DB writes aren't possible. Automatic sync, but more engine load. A script task builds the 5k Case array and 5k Event array (only if you really need them), then a sequential MNI subprocess node performs the Write Records each in bulk.  Its advantage is automatic sync: each write syncs the rows it touched, so there's no separate sync loop to build or throttle. The cost is that the process still constructs and holds those arrays per batch, and Appian does all the write-and-sync work across ~200 iterations, so engine load is real. Fine when you want zero DB development and volumes are moderate. You also have to ensure each subprocess completed successfully. 

    If you have a high-throughput sync pattern in general, this Appian guide may help: https://community.appian.com/success/w/guide/3623/writing-to-synced-record-types-at-high-throughput

    One thing worth thinking about beyond the write itself: at this volume the Event History table also raises questions about how many of those events truly need to be queryable in Appian versus just persisted for audit, plus process retention on the batch models. There are a few levers there that change the answer depending on your record tier and how the audit trail is consumed. Happy to go deeper if useful, but that part depends a lot on your specific setup.

    Net: first question whether the bulk write belongs in Appian at all. If it does, use a stored procedure for the set-based work that returns only IDs, drive a per-batch Sync Records loop from those IDs, and rule out 1A. Don't recreate its footprint by shipping full records around in 1B or into the procedure.

Reply
  • Before ranking 1A/1B/2, the first question is whether this belongs in Appian at all.

    Is this even an Appian job? Appian isn't an ETL engine. If those 1M+ rows live in a database Appian doesn't exclusively own, the cheapest and safest bulk mutation is usually a native DB job or the customer's existing ETL tooling, and then you let Appian sync the result instead of performing the write. Keep Appian as orchestrator and record consumer, not bulk data processor. Worth confirming there isn't already a more cost-effective tool in the estate before pushing a seven-figure operation through process execution.

    Rule out 1A first, it's the dangerous one. It queries 5k per batch but then uses MNI to write one record at a time. Across the full job that's up to ~1M process instances, each holding its Case payload plus the paired Event History record as resident process state. You stack two failure modes: a large memory footprint and saturation of the execution engine queues not even easy to stop if any issue arises. At this volume 1A is the fastest way to take an environment down. Avoid it outright.

    That leaves 1B and 2, and they differ on one axis: stay in-platform with automatic sync (1B), or offload the heavy DML to the database (2). Here's how I'd rank them for your scenario:

    2, stored procedure. > Best default for scale and platform load. The procedure runs the set-based UPDATE and INSERT directly and returns only an array of affected IDs. Set-based DML is far faster, simpler to recover, and keeps the process memory-light. Two things to get right: don't pass full record arrays into the procedure (that reintroduces 1A's memory problem, let the DB own the data and hand back only IDs), and because stored-proc changes bypass automatic sync you must run an explicit Sync Records loop. That loop caps at 1,000 IDs per iteration and Sync Records is meant for low-throughput use, so loop it per batch rather than as one monolithic pass: https://docs.appian.com/suite/help/26.6/Sync_Records_Smart_Service.html#syncing-more-than-1000-records

    1B, bulk Write Records > Acceptable fully-in-platform fallback if DB writes aren't possible. Automatic sync, but more engine load. A script task builds the 5k Case array and 5k Event array (only if you really need them), then a sequential MNI subprocess node performs the Write Records each in bulk.  Its advantage is automatic sync: each write syncs the rows it touched, so there's no separate sync loop to build or throttle. The cost is that the process still constructs and holds those arrays per batch, and Appian does all the write-and-sync work across ~200 iterations, so engine load is real. Fine when you want zero DB development and volumes are moderate. You also have to ensure each subprocess completed successfully. 

    If you have a high-throughput sync pattern in general, this Appian guide may help: https://community.appian.com/success/w/guide/3623/writing-to-synced-record-types-at-high-throughput

    One thing worth thinking about beyond the write itself: at this volume the Event History table also raises questions about how many of those events truly need to be queryable in Appian versus just persisted for audit, plus process retention on the batch models. There are a few levers there that change the answer depending on your record tier and how the audit trail is consumed. Happy to go deeper if useful, but that part depends a lot on your specific setup.

    Net: first question whether the bulk write belongs in Appian at all. If it does, use a stored procedure for the set-based work that returns only IDs, drive a per-batch Sync Records loop from those IDs, and rule out 1A. Don't recreate its footprint by shipping full records around in 1B or into the procedure.

Children
No Data