有没有办法在oracle中使用包含数百万行的表更快地选择随机行。我尝试使用sample(x)和dbms_random.value,并且需要很长时间才能运行。
谢谢!
答案 0 :(得分:10)
使用sample(x)
的适当值是最快的方法。它是块内的块随机和行随机,所以如果你只想要一个随机行:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum = 1
我正在使用一个子分区表,即使抓住多行,我的随机性也相当不错:
select dbms_rowid.rowid_relative_fno(rowid) as fileno,
dbms_rowid.rowid_block_number(rowid) as blockno,
dbms_rowid.rowid_row_number(rowid) as offset
from (select rowid from [my_big_table] sample (.01))
where rownum <= 5
FILENO BLOCKNO OFFSET
---------- ---------- ----------
152 2454936 11
152 2463140 32
152 2335208 2
152 2429207 23
152 2746125 28
我怀疑你应该调整你的SAMPLE
子句,以便为你所取的内容使用合适的样本量。
答案 1 :(得分:3)
首先从Adam的答案开始,但如果SAMPLE
不够快,即使使用ROWNUM优化,也可以使用块样本:
....FROM [table] SAMPLE BLOCK (0.01)
这适用于块级别的采样而不是每行。这意味着它可以跳过表中的大量数据,因此样本百分比将非常粗糙。具有低百分比的SAMPLE BLOCK返回零行并不罕见。
答案 2 :(得分:2)
这是关于AskTom的相同问题:
http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522
如果您知道桌子有多大,请使用上面描述的样本块。如果不这样做,您可以修改下面的例程以获得所需的行数。
复制自:http://asktom.oracle.com/pls/apex/f?p=100:11:0::::P11_QUESTION_ID:6075151195522#56174726207861
create or replace function get_random_rowid
( table_name varchar2
) return urowid
as
sql_v varchar2(100);
urowid_t dbms_sql.urowid_table;
cursor_v integer;
status_v integer;
rows_v integer;
begin
for exp_v in -6..2 loop
exit when (urowid_t.count > 0);
if (exp_v < 2) then
sql_v := 'select rowid from ' || table_name
|| ' sample block (' || power(10, exp_v) || ')';
else
sql_v := 'select rowid from ' || table_name;
end if;
cursor_v := dbms_sql.open_cursor;
dbms_sql.parse(cursor_v, sql_v, dbms_sql.native);
dbms_sql.define_array(cursor_v, 1, urowid_t, 100, 0);
status_v := dbms_sql.execute(cursor_v);
loop
rows_v := dbms_sql.fetch_rows(cursor_v);
dbms_sql.column_value(cursor_v, 1, urowid_t);
exit when rows_v != 100;
end loop;
dbms_sql.close_cursor(cursor_v);
end loop;
if (urowid_t.count > 0) then
return urowid_t(trunc(dbms_random.value(0, urowid_t.count)));
end if;
return null;
exception when others then
if (dbms_sql.is_open(cursor_v)) then
dbms_sql.close_cursor(cursor_v);
end if;
raise;
end;
/
show errors
答案 3 :(得分:1)
下面这个问题的解决方案并不是确切的答案,但在许多情况下,您尝试选择一行并尝试将其用于某种目的,然后使用“used”或“done”更新其状态,以便您不选择它再次。
解决方案:
下面的查询很有用但是如果你的表很大,我只是尝试看到你肯定会遇到这个查询的性能问题。
SELECT * FROM (SELECT * FROM table ORDER BY dbms_random.value) 在哪里rownum = 1
因此,如果您设置如下所示的rownum,那么您可以解决性能问题。通过递增rownum,您可以减少可能性。但在这种情况下,您将始终从相同的1000行获取行。如果您从1000获取一行并使用“USED”更新其状态,则每次使用“ACTIVE”查询时,您将几乎获得不同的行
SELECT * FROM
( SELECT * FROM table
where rownum < 1000
and status = 'ACTIVE'
ORDER BY dbms_random.value )
WHERE rownum = 1
选择后更新行状态,如果无法更新,则意味着另一个事务已经使用过它。然后您应该尝试获取新行并更新其状态。顺便说一下,由于rownum是1000,因此通过两种不同的交易可能性获得相同的行是0.001。
答案 4 :(得分:1)
有人告诉样本(x)是最快的方法。 但对我来说,这种方法比sample(x)方法稍微快一些。 无论表的大小是多少,它都应该占秒的一小部分(在我的情况下为0.2)。如果需要更长时间尝试使用提示( - + leading(e)use_nl(e t)rowid(t))可以帮助
SELECT *
FROM My_User.My_Table
WHERE ROWID = (SELECT MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value)
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN My_User.My_Table t
ON t.Rowid BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))
答案 5 :(得分:0)
没有返回行时重试的版本:
WITH gen AS ((SELECT --+ inline leading(e) use_nl(e t) rowid(t)
MAX(t.ROWID) KEEP(DENSE_RANK FIRST ORDER BY dbms_random.value) Row_Id
FROM (SELECT o.Data_Object_Id,
e.Relative_Fno,
e.Block_Id + TRUNC(Dbms_Random.Value(0, e.Blocks)) AS Block_Id
FROM Dba_Extents e
JOIN Dba_Objects o ON o.Owner = e.Owner AND o.Object_Type = e.Segment_Type AND o.Object_Name = e.Segment_Name
WHERE e.Segment_Name = 'MY_TABLE'
AND(e.Segment_Type, e.Owner, e.Extent_Id) =
(SELECT MAX(e.Segment_Type) AS Segment_Type,
MAX(e.Owner) AS Owner,
MAX(e.Extent_Id) KEEP(DENSE_RANK FIRST ORDER BY Dbms_Random.Value) AS Extent_Id
FROM Dba_Extents e
WHERE e.Segment_Name = 'MY_TABLE'
AND e.Owner = 'MY_USER'
AND e.Segment_Type = 'TABLE')) e
JOIN MY_USER.MY_TABLE t ON t.ROWID BETWEEN Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 0)
AND Dbms_Rowid.Rowid_Create(1, Data_Object_Id, Relative_Fno, Block_Id, 32767))),
Retries(Cnt, Row_Id) AS (SELECT 1, gen.Row_Id
FROM Dual
LEFT JOIN gen ON 1=1
UNION ALL
SELECT Cnt + 1, gen.Row_Id
FROM Retries
LEFT JOIN gen ON 1=1
WHERE Retries.Row_Id IS NULL AND Retries.Cnt < 10)
SELECT *
FROM MY_USER.MY_TABLE
WHERE ROWID = (SELECT Row_Id
FROM Retries
WHERE Row_Id IS NOT NULL)
答案 6 :(得分:0)
您可以使用伪随机行吗?
select * from (
select * from ... where... order by ora_hash(rowid)
) where rownum<100