Hello,
I'm facing the following task - I need to be able to generate a unique sample from a table/view. A couple of notes:
These are the things I've tried/investigated:
a!localVariables( local!random: ri!min + tointeger(rand() * (ri!max - ri!min)), local!arrray: append(ri!array, local!random), if( length(union(local!arrray, local!arrray)) >= ri!count, local!arrray, rule!KS_recursion_rand_dev_selector( min: ri!min, max: ri!max, count: ri!count, array: union(local!arrray, local!arrray) ) ) )
Any suggestions? Thank you!
Discussion posts and replies are publicly visible
Please consider a stored procedure for calculation and logical manipulation. It gives more options and flexibility than a view. You can have the stored procedure to collect the sample and dump it in sample table. From which appian can take the ids to query and act upon.
I suggest to implement this in a stored procedure and call it from Appian.
Do you happen to have an example of how it can be done in a stored procedure? Do you use SAMPLE() in oracle and then filter on that or?
Try using DBMS_RANDOM.VALUE
SELECT * FROM (SELECT t1.column1, t2.column2, t3.column3, t4.column4, -- Specify the columns you need DBMS_RANDOM.VALUE AS random_val -- Generate a random number for each row FROM table1 t1 JOIN table2 t2 ON t1.common_key = t2.common_key -- Adjust this join condition JOIN table3 t3 ON t2.common_key = t3.common_key -- Adjust this join condition JOIN table4 t4 ON t3.common_key = t4.common_key -- Adjust this join condition ORDER BY DBMS_RANDOM.VALUE) -- Order rows randomly WHERE ROWNUM <= :sample_size';
Why not use the TotalCount to determine the table's current size, then generate a random value between 1 and N, and query that using the generated value as the StartIndex (with a page size of 1)? You could do this for one or several queries into a random row.
a!localVariables( local!totalEntries: rule!ASDF_QRY_PersonDocuments( pagingInfo: a!pagingInfo(1, 0), fetchTotalCount: true() ).totalCount, local!totalSamples: 5, local!sampleIndices: a!forEach( enumerate(local!totalSamples), tointeger(rand()*local!totalEntries) ), local!sampleQueries: a!forEach( local!sampleIndices, index(rule!ASDF_QRY_PersonDocuments( pagingInfo: a!pagingInfo( startIndex: fv!item, batchSize: 1 ) ).data, 1) ), {} )
This takes about 1 second to execute for a table with just shy of 700,000 entries. I imagine (depending on the number of samples desired) that it would scale up to your 25M scope decently(?) though i'd have a hard time testing that for you, as this is my prod system's largest table.
I tried that, but if the start index is big enough, Appian times out with this error:
I'll need to query from a view since I can make sure the view has the correct fields that I need to filter on.
Youch. Then I must defer to prior suggestions to run it through a SP.
I've seen some comments that this is a very heavy operation on large tables. If even after the all the joins, the table is still 25 million rows, is DBMS_RANDOM.VALUE still possible performance wise?
What are you thinking of in terms of running it through an SP?