Hello Everyone, We have a scenario where an user will upload an excel

shailjas over 9 years ago

Hello Everyone,

We have a scenario where an user will upload an excel file containing data and system will parse the data and write the details to the database.
Each record in the excel corresponds to a record in the data base. We are using the Read Excel plugin to parse and Appian Write to data stores to write to the database.
There is no upper limit on the number of records in the excel. User can upload 40000 or even more records. If we try to write such large data sets into the database, it is causing a server outage.

Also, after the data is written to the database, an approval task is generated. In this task, an approver can upload a new version of the Excel File.
If a new version is available, then all the records written to the database earlier must be deleted and new records will be written for the case.

We would like to know the various design approaches we can consider to write and delete large number of records into the database.

Discussion posts and replies are publicly visible

Parents

0 sikhivahans over 9 years ago

@shailjas As per my previous experiences, I would like to make a comment on the batching approach mentioned above. Instead of determining the batch size as 1000, it would be good to do so based on the time consumed by Appian to insert records. For instance, Appian Health Check has flagged a simple Write operation (which inserts 1000 records, each record having just 6 columns, and the record is flat containing no foreign keys) in our project as 'High' risk. So it would be good to determine the batch size (infact, for any operation) based on the time(As per the health check, there is a constraint on the time consumed by any operation) it consumes. If a batch size is not properly determined, it leads to the problems mentioned at 'https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html' under the section named 'Slow Smart Services' and https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html#USRX-0028.

If you are interested in initiating any process that should be run per each batch, then I would like to suggest to go through the 'Uneven Process Instance Distribution' section at https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html. Ignoring this might lead to imbalance in the distribution of processes among execution engines, as 40000 specified by you is a very high number.

Also I would like to suggest to review the documentation at https://forum.appian.com/suite/help/7.11/Database_Performance_Best_Practices.html before opting for any approach. Check if there is a possibility for delegating the complex processing to database as much as possible. For instance, we used to insert the new data into a fresh table and the data(if any) in the table would be dropped before inserting the new data. Now this data would be compared against the data in the existing table, and we used the view for this and used to flag the records based on the operation(delete/update/insert) they need to undergo. Finally this view would be useful in determining the records that should be updated accordingly. You may also opt for a 'Query Database' smart service if you think that a simple native SQL query would do a quicker operation rather than building a complex mechanism in Appian. Check how best you can achieve the usecase by leveraging the features of Appian and database.

In short, there are many aspects which you should take care of before finalising the design, and once when you finalise the design there are some more aspects that should be taken care of. Probably it would be better to finalise a design as per the knowledge you have re requirements (i.e. the table structure, number of columns in the table, how frequently these operations should be done, how complex the comparisons are between new and existing data etc) and then let the practitioners here know so that they can provide some valuable suggestions.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Reply

0 sikhivahans over 9 years ago

@shailjas As per my previous experiences, I would like to make a comment on the batching approach mentioned above. Instead of determining the batch size as 1000, it would be good to do so based on the time consumed by Appian to insert records. For instance, Appian Health Check has flagged a simple Write operation (which inserts 1000 records, each record having just 6 columns, and the record is flat containing no foreign keys) in our project as 'High' risk. So it would be good to determine the batch size (infact, for any operation) based on the time(As per the health check, there is a constraint on the time consumed by any operation) it consumes. If a batch size is not properly determined, it leads to the problems mentioned at 'https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html' under the section named 'Slow Smart Services' and https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html#USRX-0028.

If you are interested in initiating any process that should be run per each batch, then I would like to suggest to go through the 'Uneven Process Instance Distribution' section at https://forum.appian.com/suite/help/7.11/Appian_Health_Check.html. Ignoring this might lead to imbalance in the distribution of processes among execution engines, as 40000 specified by you is a very high number.

Also I would like to suggest to review the documentation at https://forum.appian.com/suite/help/7.11/Database_Performance_Best_Practices.html before opting for any approach. Check if there is a possibility for delegating the complex processing to database as much as possible. For instance, we used to insert the new data into a fresh table and the data(if any) in the table would be dropped before inserting the new data. Now this data would be compared against the data in the existing table, and we used the view for this and used to flag the records based on the operation(delete/update/insert) they need to undergo. Finally this view would be useful in determining the records that should be updated accordingly. You may also opt for a 'Query Database' smart service if you think that a simple native SQL query would do a quicker operation rather than building a complex mechanism in Appian. Check how best you can achieve the usecase by leveraging the features of Appian and database.

In short, there are many aspects which you should take care of before finalising the design, and once when you finalise the design there are some more aspects that should be taken care of. Probably it would be better to finalise a design as per the knowledge you have re requirements (i.e. the table structure, number of columns in the table, how frequently these operations should be done, how complex the comparisons are between new and existing data etc) and then let the practitioners here know so that they can provide some valuable suggestions.
Cancel
Vote Up 0 Vote Down

Sign in to reply

Verify Answer

Cancel

Children

No Data