Best way to migrate a large amount of documents (70-80gbs)

Hi All,

Looking for the best way to move a large amount of documents from an external system into Appian. We are replacing an existing mainframe system, and with that we are doing a doc and data migration. These documents are broken out into a logical structure, and I tried to use the upload zip piece. But Appian will only allow for folders under 1GB. I tried to break out the folders into smaller folders, but even at 600mb a folder, the system was still getting choked up. They are in the cloud. Any ideas/approaches to executing a large document migration?

  Discussion posts and replies are publicly visible

Parents Reply Children
  • Certified Lead Developer
    in reply to Eliot Gerson

    Now I will say that the majority of the time, the one little content engine we have still runs like the dickens even with several, several million documents and folders.   It seems that multiple concurrent time-consuming queries on the content engine can eventually cause slowdown and node-stoppage.  If your design limits how frequently multiple users might be looking for a document at the same time, or if you limit the number of times folders get moved, renamed, created, deleted, added to knowledge centers, removed from knowledge centers, or the number of times the security changes on those objects, you'll probably feel less heat from the content engine.

    To that end, I would take the time to migrate your documents one at a time.  I would avoid concurrency as much as possible, because tiny hiccups and delays might begin to compound over that many documents, and as slowdown increases, the likelihood of a node timing out and grinding your process to a halt goes up.  Slow and steady wins the race.

  • If you're running to performance issues, I would encourage you to create a ticket with our support team. They will be able to help give suggestions based on the symptoms you're seeing. For example, they may recommend that you add additional replicas of your content engine to improve throughput if you're seeing bottlenecks.