Best way to migrate a large amount of documents (70-80gbs)

Hi All,

Looking for the best way to move a large amount of documents from an external system into Appian. We are replacing an existing mainframe system, and with that we are doing a doc and data migration. These documents are broken out into a logical structure, and I tried to use the upload zip piece. But Appian will only allow for folders under 1GB. I tried to break out the folders into smaller folders, but even at 600mb a folder, the system was still getting choked up. They are in the cloud. Any ideas/approaches to executing a large document migration?

  Discussion posts and replies are publicly visible

Parents
  • Certified Lead Developer

    Appian can build a great many things, but a data-warehouse or document repository is not necessarily among them.  You have between 3 and 32 shards, or process execution / analytics engine pairs, but even with 32 pairs of processing engines, you still only get one content engine per environment.  Ours is being bogged down and feeling the strain under several million objects, most of them empty folders.  We're seeing numerous incidents of nodes that query the content engines failing due to time-out and jamming processes.  Size may not play that significant a role compared to quantity.  Think carefully about how many documents the system is intended to support before it's sunsetted.

    If your users are going to be querying / utilizing these documents on a regular basis, you may want to contemplate off-site storage, and only pulling those that are actively being used into your Appian memory / storage, the same way we're contemplating it.

Reply
  • Certified Lead Developer

    Appian can build a great many things, but a data-warehouse or document repository is not necessarily among them.  You have between 3 and 32 shards, or process execution / analytics engine pairs, but even with 32 pairs of processing engines, you still only get one content engine per environment.  Ours is being bogged down and feeling the strain under several million objects, most of them empty folders.  We're seeing numerous incidents of nodes that query the content engines failing due to time-out and jamming processes.  Size may not play that significant a role compared to quantity.  Think carefully about how many documents the system is intended to support before it's sunsetted.

    If your users are going to be querying / utilizing these documents on a regular basis, you may want to contemplate off-site storage, and only pulling those that are actively being used into your Appian memory / storage, the same way we're contemplating it.

Children
No Data