Performance issue with batch processing of data is stored in DataBase.

Hi All,

I have performance issue with batch processing of data,

Required to do:  Need to update around 3L of records in DB.

First Try: Implemented a process where 200 records will be fetched at once and passed to subprocess which is configured for MNI. But the issue here is when we configure MNI it has a limit of 1000 instances per node. So after that it moves to pause state.

Second Try: Updated the same process  without configuring  MNI, but passing 200 records to the subprocess and looping it internally till 200 records are done and then back to parent process will again fetch 200 records and will pass them to subprocess.  This continues till all the records in DB completes. i.e 3L records.

So here finally in the Second Try i was able to loop it successfully with out any pauses or breaks or fails after 1000 records. But the performance is too low and taking 20 hrs for 35k records to update. In this case if we have to process 3L records it will take a week  or more.

Can anyone suggest on this.

Thanks

Vasu J

  Discussion posts and replies are publicly visible

Parents
  • Hi Vasu

    First question: what is the nature of the update you're performing? Can this update be performed by a Stored Procedure, for example? The benefits are two-fold - you won't be shipping data across the network; and secondly you can control the level of transactionality across your update (from an individual record update scope to an ALL-or-NOTHING scope, or anything in between)

  • Stewart,

    You mean to say instead of hitting the DB multiple times, store all the updated records in PV  and then write everything on one hit to DB.

    Correct..?

  • No. I was thinking more along the lines of invoking a Stored Procedure that performs the whole process - selecting the candidate records to be updated and applying the updates. But it's hard to know if this is a viable option until you amplify a bit more about the specific problem you're trying to solve.

  • 0
    Certified Senior Developer
    in reply to Stewart Burchell

    Hi  ,

    I know you posted this reply five years ago, but I have a question.

    I have a scenario where I need to store a large amount of data in a database. Could you please provide guidance on the process for accomplishing this?

    As per my understanding:

     - I believe I need to utilize the Excel to Database Smart Service to store the data.
     - I can implement looping to handle the data processing, but I am concerned about potential performance issues.
     - I am considering using stored procedures. Could you please explain this further?

    I hope this information will assist me in resolving the issue. Please notify me whenever you are available online.
  • So, first of all: performance is only an issue if it's an issue! That is: what timeframes do you have to work with? (e.g. do you need to load the data overnight so that it's available to users the following day? Is it once a week? A month? Or is it on a continuous basis? So that's the first question: how much time do have between receiving the data you need to load and the deadline by which it needs to be loaded by?

    Secondly: what format is the data arriving in? 9and do you have any influence on the format that it's benign extracted into?) And what level of transformation do you need to apply before it can be loaded into the operational tables? The less work you have to do on the data between receiving it and loading it, the shorter the load time will be.

    Thirdly: how volatile is the structure of the data? that is, how often does it change which would result in a maintenance task in the notional ETL process between the source system and your target system? And how much notice do you get regarding such changes? There's a balancing act between building something that performs the best and having a design that allows changes to be made to meet a given deadline for those changes to be operational. So you'll also need to assess what knowledge/skills you need to conduct such maintenance. You should be prepared to potentially sacrifice some performance in exchange for ease-of-change.

    Only once you have answers to the above should we then start to consider some design options.

Reply
  • So, first of all: performance is only an issue if it's an issue! That is: what timeframes do you have to work with? (e.g. do you need to load the data overnight so that it's available to users the following day? Is it once a week? A month? Or is it on a continuous basis? So that's the first question: how much time do have between receiving the data you need to load and the deadline by which it needs to be loaded by?

    Secondly: what format is the data arriving in? 9and do you have any influence on the format that it's benign extracted into?) And what level of transformation do you need to apply before it can be loaded into the operational tables? The less work you have to do on the data between receiving it and loading it, the shorter the load time will be.

    Thirdly: how volatile is the structure of the data? that is, how often does it change which would result in a maintenance task in the notional ETL process between the source system and your target system? And how much notice do you get regarding such changes? There's a balancing act between building something that performs the best and having a design that allows changes to be made to meet a given deadline for those changes to be operational. So you'll also need to assess what knowledge/skills you need to conduct such maintenance. You should be prepared to potentially sacrifice some performance in exchange for ease-of-change.

    Only once you have answers to the above should we then start to consider some design options.

Children
No Data