Hi All,
I am having nearly 45000 rows of data with 11 columns in CDT.Need to upload data to Appian DB without affecting performance.Kindly provide your suggestions for splitting the data into batches say 2000 rows each time and then upload data to DB.
Thanks in Advance.
Pradeep.B
Discussion posts and replies are publicly visible
This largely depends on where your 45000 rows of data are currently located. Is it in some running process already? Or in an excel file? Or something else?
First the data will be provided in notepad file,then i will convert and map the data into cdt using expression rules.
From notepad, would you just be pasting it into Appian via an interface, or uploading the text file, etc? I suppose either could work but would require a slightly different approach up-front.
Either way, presuming that you end up with all 45,000 rows within a PV, preferably parsed into a CDT array, you can create a looping process that takes, for example, 1,000 rows at a time, stores them in a separate PV (overwriting the previous contents of that separate PV), writes that to the DB, then moves onto the next set. I've done this before and assuming it's not something that needs to be done routinely / constantly, it should work well.
Note that if this process is something that will need to be done more often, you will need to consider further safety checks (like handling empty data, making sure the loop doesn't try to execute infinitely many times, etc).
Hi Pradeep,
If the dat is in database, create a scheduled process to write to database. Schedule the process in non business hours.
For further details Transferring/Processing Large Data Sets (ETL)
Thank you Dude for you reply.
i will read say 10 characters from each row and map it to particular column in cdt. Similarly i will read particular place characters using mid function for remaining columns and will map it to CDT.And then have to write the data to DB in batches without affecting performance.
For your purposes it might be best to batch rows of your raw TEXT and pass into a subprocess (for example) where the parsing occurs, row-by-row, just before the DB write occurs. In the parent process you would store a separate PV, integer type, to reflect for example which row will be the next starting point (so if you've parsed rows 1 - 10, your marker PV would store a value of "11" to indicate that the next loop will start at row 11).
As with most things, I suspect you write your parser in an Expression Rule and call this expression rule from a script task in your process model. This will allow you to paste in small chunks of your raw text format, run the rule many times and see that it returns the exact output you're wanting. I encourage you to look into the extract() function, as well as the utilities available within the Regex Functions plug-in, in case either of those would make parsing easier.