<?xml version="1.0" encoding="UTF-8" ?>
<?xml-stylesheet type="text/xsl" href="https://community.appian.com/cfs-file/__key/system/syndication/rss.xsl" media="screen"?><rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:slash="http://purl.org/rss/1.0/modules/slash/" xmlns:wfw="http://wellformedweb.org/CommentAPI/"><channel><title>Selecting a random sample from a large table</title><link>https://community.appian.com/discussions/f/data/37690/selecting-a-random-sample-from-a-large-table</link><description>Hello, 
 I&amp;#39;m facing the following task - I need to be able to generate a unique sample from a table/view. A couple of notes: 
 
 This is a large table, about 25 million rows 
 This table will need to be filtered first, then sampled
 
 To filter, this</description><dc:language>en-US</dc:language><generator>Telligent Community 12</generator><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141594?ContentTypeID=1</link><pubDate>Fri, 11 Oct 2024 06:02:08 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:6b3a1b3a-a85d-44ed-b065-58489f4ec565</guid><dc:creator>Soma</dc:creator><description>&lt;p&gt;Since you are using stored procedure, you should be able to filter out using certain parameters to bring down the data size&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141591?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 18:55:55 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:7fbe8bf4-4535-4c47-85fc-8dc766c168ab</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;gotcha, I&amp;#39;ll do some more investigation into that then..thanks!&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141590?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 17:01:56 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:f9931406-06cb-4f11-a93e-9368818fc4fa</guid><dc:creator>Mike Schmitt</dc:creator><description>&lt;p&gt;True random values often appear &amp;quot;skewed&amp;quot; due to observer bias.&amp;nbsp; If you want something that&amp;#39;s evenly distributed instead of truly random, you&amp;#39;ll probably need to determine your own algorithm for that (i.e. take indexes from 1 to N divided by samples desired, then randomize each sample index by taking some deviation based on a generated random).&amp;nbsp; The DB software has loads of functions available and I don&amp;#39;t even begin to claim to fathom them all.&amp;nbsp; If you were working in MariaDB i would be more confident in sharing some of my stored proc tricks (disclaimering that i haven&amp;#39;t done much play with random numbers in it), but you could probably discover the ones you need by googling the Oracle DB docs.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141589?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 16:57:48 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:9081d09d-d98a-4384-b59b-d6f0bde108b7</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;Do you have examples of random functions that can be done on the db side? The only ones I&amp;#39;ve been able to find are&amp;nbsp;&lt;span&gt;DBMS_RANDOM.VALUE and SAMPLE() for oracle, but I&amp;#39;ve heard&amp;nbsp;DBMS_RANDOM.VALUE can be very slow on large tables, and SAMPLE() tends to return skewed data.&lt;/span&gt;&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141587?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 16:20:06 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:35c24732-482d-4b29-884d-41c2ddaabe73</guid><dc:creator>Mike Schmitt</dc:creator><description>&lt;p&gt;The SP can recreate much of the view logic you&amp;#39;re thinking of internally and do more efficient / more advanced calculations in advance (i assume up to and including using some db-internal logic to select some samples) - since it&amp;#39;s all done internal to the DB, it&amp;#39;ll process faster than trying to query through Appian.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141586?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 16:17:27 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:022f3188-c77c-4034-af2b-bf58e85ac75f</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;What are you thinking of in terms of running it through an SP?&amp;nbsp;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141585?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 16:15:35 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:d873e580-08fa-4808-af54-9ec0932d5972</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;I&amp;#39;ve seen some comments that this is a very heavy operation on large tables. If even after the all the joins, the table is still 25 million rows, is DBMS_RANDOM.VALUE still possible performance wise?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141584?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 15:51:08 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:183c8dd4-b133-4e14-b607-767712c29e80</guid><dc:creator>Mike Schmitt</dc:creator><description>&lt;p&gt;Youch.&amp;nbsp; Then I must defer to prior suggestions to run it through a SP.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141583?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 15:41:43 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:8531c4a5-2729-421c-b6a8-5c0fe55b1eba</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;I tried that, but if the start index is big enough, Appian times out with this error:&lt;img style="max-height:240px;max-width:320px;" src="/resized-image/__size/640x480/__key/communityserver-discussions-components-files/16/pastedimage1728574831374v1.png" alt=" " /&gt;&lt;/p&gt;
&lt;p&gt;I&amp;#39;ll need to query from a view since I can make sure the view has the correct fields that I need to filter on.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141582?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 15:25:18 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:704fd7ed-1e81-41f4-a553-075a8e5fc978</guid><dc:creator>Mike Schmitt</dc:creator><description>&lt;p&gt;Why not use the TotalCount to determine the table&amp;#39;s current size, then generate a random value between 1 and N, and query that using the generated value as the StartIndex (with a page size of 1)?&amp;nbsp; You could do this for one or several queries into a random row.&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="java"&gt;a!localVariables(
  local!totalEntries: rule!ASDF_QRY_PersonDocuments(
    pagingInfo: a!pagingInfo(1, 0),
    fetchTotalCount: true()
  ).totalCount,
  
  local!totalSamples: 5,
  
  local!sampleIndices: a!forEach(
    enumerate(local!totalSamples),
    
    tointeger(rand()*local!totalEntries)
  ),
  
  local!sampleQueries: a!forEach(
    local!sampleIndices,
    index(rule!ASDF_QRY_PersonDocuments(
      pagingInfo: a!pagingInfo(
        startIndex: fv!item,
        batchSize: 1
      )
    ).data, 1)
  ),
  
  
  {}
)&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;This takes about 1 second to execute for a table with just shy of 700,000 entries.&amp;nbsp; I imagine (depending on the number of samples desired) that it would scale up to your 25M scope decently(?) though i&amp;#39;d have a hard time testing that for you, as this is my prod system&amp;#39;s largest table.&lt;br /&gt;&lt;img style="max-height:240px;max-width:320px;" src="/resized-image/__size/640x480/__key/communityserver-discussions-components-files/16/pastedimage1728574409174v1.png" alt=" " /&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141581?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 15:17:36 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:198dd9c6-bab6-4501-a771-57fdbb1bf964</guid><dc:creator>Soma</dc:creator><description>&lt;p&gt;Try using DBMS_RANDOM.VALUE&lt;/p&gt;
&lt;p&gt;&lt;pre class="ui-code" data-mode="text"&gt;SELECT *
     FROM (SELECT t1.column1, t2.column2, t3.column3, t4.column4, -- Specify the columns you need
                  DBMS_RANDOM.VALUE AS random_val -- Generate a random number for each row
           FROM table1 t1
           JOIN table2 t2 ON t1.common_key = t2.common_key -- Adjust this join condition
           JOIN table3 t3 ON t2.common_key = t3.common_key -- Adjust this join condition
           JOIN table4 t4 ON t3.common_key = t4.common_key -- Adjust this join condition
           ORDER BY DBMS_RANDOM.VALUE) -- Order rows randomly
     WHERE ROWNUM &amp;lt;= :sample_size&amp;#39;;&lt;/pre&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141554?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 12:52:05 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:aa886cf8-cb5d-4a88-82e7-806f528d965b</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;&lt;span&gt;Do you happen to have an example of how it can be done in a stored procedure? Do you use&amp;nbsp;SAMPLE() in oracle and then filter on that or?&lt;/span&gt;&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141553?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 12:51:48 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:c29ec529-b01f-4896-b738-0e140c2c752a</guid><dc:creator>kl0001</dc:creator><description>&lt;p&gt;Do you happen to have an example of how it can be done in a stored procedure? Do you use&amp;nbsp;SAMPLE() in oracle and then filter on that or?&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141511?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 06:44:43 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:0c854413-269e-41f1-b0ee-57d184d162fa</guid><dc:creator>Stefan Helzle</dc:creator><description>&lt;p&gt;I suggest to implement this in a stored procedure and call it from Appian.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item><item><title>RE: Selecting a random sample from a large table</title><link>https://community.appian.com/thread/141503?ContentTypeID=1</link><pubDate>Thu, 10 Oct 2024 06:22:22 GMT</pubDate><guid isPermaLink="false">d3a83456-d57b-489c-a84c-4e8267bb592a:1ffc043e-9717-4354-b061-312d95b8049c</guid><dc:creator>Soma</dc:creator><description>&lt;p&gt;Please consider a stored procedure&amp;nbsp;for calculation and logical manipulation. It gives more options and flexibility than a view.&amp;nbsp;You can have the stored procedure to collect the sample and dump it in sample table. From which appian can take the ids to query and act upon.&lt;/p&gt;&lt;div style="clear:both;"&gt;&lt;/div&gt;</description></item></channel></rss>