We have an external event that can impact up to 1M+ Case records. For each affected Case, we need to:
We're evaluating the following approaches:
For workloads that can exceed 1 million records, which approach would you recommend and why?
While Option 1 benefits from automatic record synchronization, it may place significant load on Appian process execution and engines at this scale. Option 2 shifts the bulk processing to the database but requires additional ID tracking and explicit record synchronization.
Has anyone implemented a similar solution at this volume? Would you recommend Option 1A, Option 1B, Option 2, or another approach altogether for the best balance of performance, scalability, and Appian platform load?
Discussion posts and replies are publicly visible
Before ranking 1A/1B/2, the first question is whether this belongs in Appian at all.
Is this even an Appian job? Appian isn't an ETL engine. If those 1M+ rows live in a database Appian doesn't exclusively own, the cheapest and safest bulk mutation is usually a native DB job or the customer's existing ETL tooling, and then you let Appian sync the result instead of performing the write. Keep Appian as orchestrator and record consumer, not bulk data processor. Worth confirming there isn't already a more cost-effective tool in the estate before pushing a seven-figure operation through process execution.
Rule out 1A first, it's the dangerous one. It queries 5k per batch but then uses MNI to write one record at a time. Across the full job that's up to ~1M process instances, each holding its Case payload plus the paired Event History record as resident process state. You stack two failure modes: a large memory footprint and saturation of the execution engine queues not even easy to stop if any issue arises. At this volume 1A is the fastest way to take an environment down. Avoid it outright.
That leaves 1B and 2, and they differ on one axis: stay in-platform with automatic sync (1B), or offload the heavy DML to the database (2). Here's how I'd rank them for your scenario:
2, stored procedure. > Best default for scale and platform load. The procedure runs the set-based UPDATE and INSERT directly and returns only an array of affected IDs. Set-based DML is far faster, simpler to recover, and keeps the process memory-light. Two things to get right: don't pass full record arrays into the procedure (that reintroduces 1A's memory problem, let the DB own the data and hand back only IDs), and because stored-proc changes bypass automatic sync you must run an explicit Sync Records loop. That loop caps at 1,000 IDs per iteration and Sync Records is meant for low-throughput use, so loop it per batch rather than as one monolithic pass: https://docs.appian.com/suite/help/26.6/Sync_Records_Smart_Service.html#syncing-more-than-1000-records1B, bulk Write Records > Acceptable fully-in-platform fallback if DB writes aren't possible. Automatic sync, but more engine load. A script task builds the 5k Case array and 5k Event array (only if you really need them), then a sequential MNI subprocess node performs the Write Records each in bulk. Its advantage is automatic sync: each write syncs the rows it touched, so there's no separate sync loop to build or throttle. The cost is that the process still constructs and holds those arrays per batch, and Appian does all the write-and-sync work across ~200 iterations, so engine load is real. Fine when you want zero DB development and volumes are moderate. You also have to ensure each subprocess completed successfully. If you have a high-throughput sync pattern in general, this Appian guide may help: https://community.appian.com/success/w/guide/3623/writing-to-synced-record-types-at-high-throughput
One thing worth thinking about beyond the write itself: at this volume the Event History table also raises questions about how many of those events truly need to be queryable in Appian versus just persisted for audit, plus process retention on the batch models. There are a few levers there that change the answer depending on your record tier and how the audit trail is consumed. Happy to go deeper if useful, but that part depends a lot on your specific setup.
Net: first question whether the bulk write belongs in Appian at all. If it does, use a stored procedure for the set-based work that returns only IDs, drive a per-batch Sync Records loop from those IDs, and rule out 1A. Don't recreate its footprint by shipping full records around in 1B or into the procedure.