如何使用具有数百万行的表更快地在oracle中选择随机行

时间:2010-06-30 15:05:57

标签: oracle

有没有办法在oracle中使用包含数百万行的表更快地选择随机行。我尝试使用sample(x)和dbms_random.value,并且需要很长时间才能运行。

谢谢!

7 个答案:

答案 0 :(得分:10)

使用sample(x)的适当值是最快的方法。它是块内的块随机和行随机,所以如果你只想要一个随机行:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum = 1

我正在使用一个子分区表,即使抓住多行,我的随机性也相当不错:

select dbms_rowid.rowid_relative_fno(rowid) as fileno,
       dbms_rowid.rowid_block_number(rowid) as blockno,
       dbms_rowid.rowid_row_number(rowid) as offset
  from (select rowid from [my_big_table] sample (.01))
 where rownum <= 5

    FILENO    BLOCKNO     OFFSET
---------- ---------- ----------
       152    2454936         11
       152    2463140         32
       152    2335208          2
       152    2429207         23
       152    2746125         28

我怀疑你应该调整你的SAMPLE子句,以便为你所取的内容使用合适的样本量。

答案 1 :(得分:3)

首先从Adam的答案开始,但如果SAMPLE不够快,即使使用ROWNUM优化,也可以使用块样本:

....FROM [table] SAMPLE BLOCK (0.01)

这适用于块级别的采样而不是每行。这意味着它可以跳过表中的大量数据,因此样本百分比将非常粗糙。具有低百分比的SAMPLE BLOCK返回零行并不罕见。

答案 2 :(得分:2)

这是关于AskTom的相同问题:

http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522

如果您知道桌子有多大,请使用上面描述的样本块。如果不这样做,您可以修改下面的例程以获得所需的行数。

复制自:http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861

create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
  for exp_v in -6..2 loop
    exit when (urowid_t.count > 0);
    if (exp_v < 2) then
      sql_v := 'select rowid from ' || table_name
      || ' sample block (' || power(10, exp_v) || ')';
    else
      sql_v := 'select rowid from ' || table_name;
    end if;
    cursor_v := dbms_sql.open_cursor;
    dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
    dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
    status_v := dbms_sql.execute(cursor_v);
    loop
      rows_v := dbms_sql.fetch_rows(cursor_v);
      dbms_sql.column_value(cursor_v, 1, urowid_t);
      exit when rows_v != 100;
    end loop;
    dbms_sql.close_cursor(cursor_v);
  end loop;
  if (urowid_t.count > 0) then
    return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
  end if;
  return null;
exception when others then
  if (dbms_sql.is_open(cursor_v)) then
    dbms_sql.close_cursor(cursor_v);
  end if;
  raise;
end;
/
show errors

答案 3 :(得分:1)

下面这个问题的解决方案并不是确切的答案,但在许多情况下,您尝试选择一行并尝试将其用于某种目的,然后使用“used”或“done”更新其状态,以便您不选择它再次。

解决方案:

下面的查询很有用但是如果你的表很大,我只是尝试看到你肯定会遇到这个查询的性能问题。

SELECT * FROM (SELECT * FROM table ORDER BY dbms_random.value) 在哪里rownum = 1

因此,如果您设置如下所示的rownum,那么您可以解决性能问题。通过递增rownum,您可以减少可能性。但在这种情况下,您将始终从相同的1000行获取行。如果您从1000获取一行并使用“USED”更新其状态,则每次使用“ACTIVE”查询时,您将几乎获得不同的行

SELECT * FROM
( SELECT * FROM table
where rownum < 1000
  and status = 'ACTIVE'
  ORDER BY dbms_random.value  )
WHERE rownum = 1

选择后更新行状态,如果无法更新,则意味着另一个事务已经使用过它。然后您应该尝试获取新行并更新其状态。顺便说一下,由于rownum是1000,因此通过两种不同的交易可能性获得相同的行是0.001。

答案 4 :(得分:1)

有人告诉样本(x)是最快的方法。 但对我来说,这种方法比sample(x)方法稍微快一些。 无论表的大小是多少,它都应该占秒的一小部分(在我的情况下为0.2)。如果需要更长时间尝试使用提示( - + leading(e)use_nl(e t)rowid(t))可以帮助

SELECT *
  FROM My_User.My_Table
 WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
                  FROM (SELECT o.Data_Object_Id,
                               e.Relative_Fno,
                               e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
                          FROM Dba_Extents e
                          JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                         WHERE e.Segment_Name = 'MY_TABLE'
                           AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                              (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                      MAX(e.Owner)        AS Owner,
                                      MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                                 FROM Dba_Extents e
                                WHERE e.Segment_Name = 'MY_TABLE'
                                  AND e.Owner = 'MY_USER'
                                  AND e.Segment_Type = 'TABLE')) e
                  JOIN My_User.My_Table t
                    ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                   AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))

答案 5 :(得分:0)

没有返回行时重试的版本:

WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
                     MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
                FROM (SELECT o.Data_Object_Id,
                             e.Relative_Fno,
                             e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id 
                        FROM Dba_Extents e
                        JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
                       WHERE e.Segment_Name = 'MY_TABLE'
                         AND(e.Segment_Type, e.Owner, e.Extent_Id) =
                            (SELECT MAX(e.Segment_Type) AS Segment_Type,
                                    MAX(e.Owner)        AS Owner,
                                    MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
                               FROM Dba_Extents e
                              WHERE e.Segment_Name = 'MY_TABLE'
                                AND e.Owner = 'MY_USER'
                                AND e.Segment_Type = 'TABLE')) e
                JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
                                                  AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
  Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
                             FROM Dual
                             LEFT JOIN gen ON 1=1
                            UNION ALL
                           SELECT Cnt + 1, gen.Row_Id
                             FROM Retries
                             LEFT JOIN gen ON 1=1
                            WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
  FROM MY_USER.MY_TABLE
 WHERE ROWID = (SELECT Row_Id
                  FROM Retries
                 WHERE Row_Id IS NOT NULL)

答案 6 :(得分:0)

您可以使用伪随机行吗?

select * from (
  select * from ... where... order by ora_hash(rowid)
) where rownum<100