Recommended approach to process large amount of data from Excelsheet

Hi there,

I have a question on the design of one of the projects I am working on. In one of the projects, we have a requirement of fetching an Excel sheet (size ~12 MB) having ~55000 rows into Appian daily from external source and process it. I would like to know what would be the better/recommended approach -

a) fetch and parse excel file through Appian (using Excel Tools plugin). [Per my understanding, fetching and parsing 55k rows of data daily using Appian may impact the performance of Appian.]

b) fetch and parse excel file through some other tool like Informatica and implement only workflow part of it in Appian.

Thanks!

  Discussion posts and replies are publicly visible

Parents
  • 0
    Certified Lead Developer
    Hello Rohit,

    Processing large amount of data will impact the performance of Appian environment. We have different Excel Tolls plugin / smartservice to read the excel and import to db and then a separate call to another stored proc to execute business logic, refine and further process the data.
    This can be one of the approach where we will not hold data into the CDT or in process model. Which will not consume more AMU.

    Another approach is handle complete operation outside Appian environment. We can have a java batch process program / webservice / file upload and process program outside, which will process the data directly and then we can have the statistics which can be initially stored into a table and later fetched on Appian and business users and check the processing results.

    I would suggest to keep the implementation outside which will also a scalable solution for the future enhancement.

    Thank You
    Bibhuti
  • 0
    A Score Level 1
    in reply to Bibhuti
    Hi Bibhuti,

    Can you please let me know the different Excel Tools plugin / smartservice to read the excel and import to db which you have mentioned in your comment.

    Thanks.

    Regards,
    Balaji.R
  • 0
    Certified Lead Developer
    in reply to rp_balaji
    Hello Balaji,

    You can use "Import Excel to Database", "Load CSV File to Database" smart services which comes under Excel Tools plugin. These 2 smart services takes the excel file as an input along with No. of columns to read, Sheet number, Table name, data source and New Column / New Column value(the last 2 are optional mentioned during configuration, but required a value otherwise throwing error).

    One more thing which needs to be take case during implementation is that, if the excel sheet is having any blank columns or rows but formatting like border colour or anything is applied then the smart service importing those blank rows into the table and also in case of blank columns it will try to find columns in table and will fail execution. Hence take case of the exact data to be present in the sheet during upload.

    Thank You.
  • 0
    A Score Level 1
    in reply to Bibhuti
    Hi Bibhuti,

    Thanks for the details and have a question, while using the smart service you mentioned if we are processing a file of 12 MB (as mentioned in the post) wont the smart service gets timed out coz of large amount of data.

    Regards,
    Balaji.R
  • 0
    Certified Lead Developer
    in reply to rp_balaji
    Hello Balaji,

    Yes I guess that is possible based on the Heap memory usage and cause tmeout also. So only I had suggested to import externally using other application or tool.

    Thank You.
  • 0
    Certified Lead Developer
    in reply to rp_balaji
    Just to add-on I guess there is also one more why if it needs to be handled from Appian only. The Smart service is having one more parameter batchsize which is 1000 by default. So there can be one more solution to break the large excel to chunks and then upload using apian with a safer batch size which will not consume more heap memory.
  • 0
    A Score Level 1
    in reply to Bibhuti
    Hi Bibhuti,

    Thanks for the response. I have a query here, do we need to execute the smart service in loop by setting the batch size explicitly till the last row of the file is parsed or will the smart service takes care of the batch till EOF by itself by executing the service only once?

    Regards,
    Balaji.R
  • 0
    Certified Lead Developer
    in reply to rp_balaji
    Hello Balaji,

    We need to explicitly pass the batch size to process and it will not do looping automatically instead we need to create a loop until completion. But again we need to be very careful about the memory usage.
    Thank You
Reply Children