Hello,
I'm facing the following task - I need to be able to generate a unique sample from a table/view. A couple of notes:
These are the things I've tried/investigated:
a!localVariables( local!random: ri!min + tointeger(rand() * (ri!max - ri!min)), local!arrray: append(ri!array, local!random), if( length(union(local!arrray, local!arrray)) >= ri!count, local!arrray, rule!KS_recursion_rand_dev_selector( min: ri!min, max: ri!max, count: ri!count, array: union(local!arrray, local!arrray) ) ) )
Any suggestions? Thank you!
Discussion posts and replies are publicly visible
Why not use the TotalCount to determine the table's current size, then generate a random value between 1 and N, and query that using the generated value as the StartIndex (with a page size of 1)? You could do this for one or several queries into a random row.
a!localVariables( local!totalEntries: rule!ASDF_QRY_PersonDocuments( pagingInfo: a!pagingInfo(1, 0), fetchTotalCount: true() ).totalCount, local!totalSamples: 5, local!sampleIndices: a!forEach( enumerate(local!totalSamples), tointeger(rand()*local!totalEntries) ), local!sampleQueries: a!forEach( local!sampleIndices, index(rule!ASDF_QRY_PersonDocuments( pagingInfo: a!pagingInfo( startIndex: fv!item, batchSize: 1 ) ).data, 1) ), {} )
This takes about 1 second to execute for a table with just shy of 700,000 entries. I imagine (depending on the number of samples desired) that it would scale up to your 25M scope decently(?) though i'd have a hard time testing that for you, as this is my prod system's largest table.
I tried that, but if the start index is big enough, Appian times out with this error:
I'll need to query from a view since I can make sure the view has the correct fields that I need to filter on.
Youch. Then I must defer to prior suggestions to run it through a SP.
What are you thinking of in terms of running it through an SP?
The SP can recreate much of the view logic you're thinking of internally and do more efficient / more advanced calculations in advance (i assume up to and including using some db-internal logic to select some samples) - since it's all done internal to the DB, it'll process faster than trying to query through Appian.
Do you have examples of random functions that can be done on the db side? The only ones I've been able to find are DBMS_RANDOM.VALUE and SAMPLE() for oracle, but I've heard DBMS_RANDOM.VALUE can be very slow on large tables, and SAMPLE() tends to return skewed data.
True random values often appear "skewed" due to observer bias. If you want something that's evenly distributed instead of truly random, you'll probably need to determine your own algorithm for that (i.e. take indexes from 1 to N divided by samples desired, then randomize each sample index by taking some deviation based on a generated random). The DB software has loads of functions available and I don't even begin to claim to fathom them all. If you were working in MariaDB i would be more confident in sharing some of my stored proc tricks (disclaimering that i haven't done much play with random numbers in it), but you could probably discover the ones you need by googling the Oracle DB docs.
gotcha, I'll do some more investigation into that then..thanks!