Hello Everyone, We have a scenario where an user will upload an excel

Hello Everyone,

We have a scenario where an user will upload an excel file containing data and system will parse the data and write the details to the database.
Each record in the excel corresponds to a record in the data base. We are using the Read Excel plugin to parse and Appian Write to data stores to write to the database.
There is no upper limit on the number of records in the excel. User can upload 40000 or even more records. If we try to write such large data sets into the database, it is causing a server outage.

Also, after the data is written to the database, an approval task is generated. In this task, an approver can upload a new version of the Excel File.
If a new version is available, then all the records written to the database earlier must be deleted and new records will be written for the case.

We would like to know the various design approaches we can consider to write and delete large number of records into the database.

OriginalPostID-189203

OriginalPostID-189203

  Discussion posts and replies are publicly visible

  • Hi Shailja, AFAIK
    To Read a excel file, you may also utilize "Parse Excel Spreadsheet to CDT".
    To Write CDT data into DB, you can design batch size operation, i.e,
    1) Get first 1-1000 index values from CDT
    2) Store return data into temporary process variable of same data type
    3) Pass this temp variable to a sub process which will write CDT data into Database.
    4) Once data is written reset your temp variable with null value.
    5) Get another 1001-2000 index values and repeat step 2 to 5 until you finish writing all the data.
    6) Delete process instance as soon as you finsh all writing activities.

    Addition to this I have few questions on your requirement.
    1) When Approver is uploading a new version of file, Is all the data is different than what previous data has been written? (i.e, all rows and column values are different or only few rows/column values are different) If all rows are not modified than it's better you indentify which rows are modified and only update modfied rows to DB.
    2) Is there any specific need to write data into Database before Approval task? if not,I would suggest you to write data once when you have a final file from the Approval Task?
  • @shailjas As per my previous experiences, I would like to make a comment on the batching approach mentioned above. Instead of determining the batch size as 1000, it would be good to do so based on the time consumed by Appian to insert records. For instance, Appian Health Check has flagged a simple Write operation (which inserts 1000 records, each record having just 6 columns, and the record is flat containing no foreign keys) in our project as 'High' risk. So it would be good to determine the batch size (infact, for any operation) based on the time(As per the health check, there is a constraint on the time consumed by any operation) it consumes. If a batch size is not properly determined, it leads to the problems mentioned at 'https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html' under the section named 'Slow Smart Services' and https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html#USRX-0028.

    If you are interested in initiating any process that should be run per each batch, then I would like to suggest to go through the 'Uneven Process Instance Distribution' section at https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html. Ignoring this might lead to imbalance in the distribution of processes among execution engines, as 40000 specified by you is a very high number.

    Also I would like to suggest to review the documentation at https://forum.appian.com/suite/help/7.11/Database_Performance_Best_Practices.html before opting for any approach. Check if there is a possibility for delegating the complex processing to database as much as possible. For instance, we used to insert the new data into a fresh table and the data(if any) in the table would be dropped before inserting the new data. Now this data would be compared against the data in the existing table, and we used the view for this and used to flag the records based on the operation(delete/update/insert) they need to undergo. Finally this view would be useful in determining the records that should be updated accordingly. You may also opt for a 'Query Database' smart service if you think that a simple native SQL query would do a quicker operation rather than building a complex mechanism in Appian. Check how best you can achieve the usecase by leveraging the features of Appian and database.

    In short, there are many aspects which you should take care of before finalising the design, and once when you finalise the design there are some more aspects that should be taken care of. Probably it would be better to finalise a design as per the knowledge you have re requirements (i.e. the table structure, number of columns in the table, how frequently these operations should be done, how complex the comparisons are between new and existing data etc) and then let the practitioners here know so that they can provide some valuable suggestions.
  • I also faced the same issue in one of recent projects. A couple of more questions to practitioners here:

    1) Did you also experience any issues with the data parsed from the xls sheet? As,not having all fields read correctly from xls into PVs?

    2) After, you write your batches to DB - checking the data; have you ever had issues where not all of data was written to DB from PVs? or having empty rows/columns inserted which requires that you cross check the xls with DB manually and correct deviations?
  • @mohamedb Hi, I haven't dealt with parsing a huge dataset so far, so I am not answering the first question.


    2) After, you write your batches to DB - checking the data; have you ever had issues where not all of data was written to DB from PVs? or having empty rows/columns inserted which requires that you cross check the xls with DB manually and correct deviations?

    To the best of my knowledge, there won't be any deviation and the operation completes perfectly as long as the data is properly parsed from source(excel, csv etc) and stored into PVs. I have written nearly hundreds of thousands of rows in a single shot into the datastore entity and I haven't seen any issue while writing such huge amounts of data.

    If we keep the deviations aside, I don't think practitioners are writing such huge datasets to datastore entity in a single shot now a days since the introduction of Health Check. Health Check flags the operations that tries to interact with Datastore with huge datasets as High risk. But still, keeping the Health Check suggestions or its implementation guidelines aside, definitely there won't be any problem in making the updates as long as the PV holds a properly parsed data.
  • @shailjas, this might be an unrelated question but wanted to check. when you say after the approver approves a separate excel, you have to delete all the rows and write back again. Will the primary key present as part of your excel or is it going to be constructed while its being written to the database. if your excel has the primary key, then may be we can use the array functions to get only the rows which are modified and write rather than repeating the entire data load from scratch.
  • @narasimhaadityac The Excel will not have primary keys and will be constructed as part of the write operation. When the user uploads a new file, all the existing records from the database needs to be deleted and new records has to be written to the database. We need not check if any records are modified. The requirement is to delete all old data and write new records for the new file.
  • @mohanp,

    1) When Approver is uploading a new version of file, Is all the data is different than what previous data has been written? (i.e, all rows and column values are different or only few rows/column values are different) If all rows are not modified than it's better you indentify which rows are modified and only update modfied rows to DB.

    When the user uploads a new file, there could be a possibility that few object numbers may be same.
    2) Is there any specific need to write data into Database before Approval task? if not,I would suggest you to write data once when you have a final file from the Approval Task?
    Yes. We also have a reporting functionality where users will search based on an excel data and the report will list out the cases that contain the data.
    If we write the data after the approval, then the search operation between the first task and Approval Task will be broken.