以伪随机顺序选择表记录

时间:2016-03-21 23:00:41

标签: java sql h2

在我的情况下,我使用的是嵌入式H2数据库,但我的问题实际上是一般的SQL数据库。

考虑这个表,其中一条记录可能引用或不引用另一条记录,并且永远不会从多个地方引用相同的记录。

CREATE TABLE test (id NUMBER, data VARCHAR, reference NUMBER) ;
INSERT INTO test (id, data) 
SELECT x, 'P'||x FROM system_range(0, 9);
UPDATE test SET reference = 2 where id = 4;
UPDATE test SET reference = 4 where id = 6;
UPDATE test SET reference = 1 where id = 7;
UPDATE test SET reference = 8 where id = 9;

SELECT * FROM test ORDER BY id;

ID  DATA    REFERENCE
----------------------------------
0   P0      null 
1   P1      null 
2   P2      null 
3   P3      null 
4   P4      2
5   P5      null 
6   P6      4 
7   P7      1 
8   P8      null 
9   P9      8 

现在我希望有一个SQL,它将以随机顺序选择测试记录,只有一个限制,引用的记录在引用它之前永远不会被选中。

一件可行的事情是SELECT * FROM test ORDER BY reference, RAND(),但对我来说这似乎并不是随机的,因为它总会首先选择所有未引用的记录,这会降低随机性水平。

说一个好的和有效的结果集冷却如下。

ID  DATA    REFERENCE
----------------------------------
8   P8      null 
2   P2      null 
1   P1      null 
4   P4      2
3   P3      null 
9   P9      8 
5   P5      null 
6   P6      4 
0   P0      null
7   P7      1 

我更喜欢纯SQL解决方案,但是H2很容易扩展我不会通过暴露我自己的Java方法来创建自定义函数。

更新 这与How to request a random row in SQL不重复,因为:

  1. 除随机性请求外,我还有参考限制。事实上,我的问题的复杂程度来自这个参考限制,而不是来自随机。
  2. 我需要选择所有表记录而不仅仅是一个

1 个答案:

答案 0 :(得分:1)

嗯,在你真正进一步挖掘之前,你永远不应该说。当我添加我对Jim的评论时,我实际上问自己H2是否提出了Oracle等效的Hierarchical Queries。当然,在高级部分H2 recursive queries

下的H2文档中有一些解释

所以这里有我的工作查询几乎满足了我的要求:

WITH link(id, data, reference, sort_val, level, tree_id) AS (
    -- Each tree root starts with a random sorting value up to half the number of records.
    -- This half the number of records is not really needed it can be a hard coded value
    -- I just said half to achieve a relative uniform distribution of three ids
    -- take the id of the starting row as a three id
    SELECT id, data, reference, round(rand()*(select count(*) FROM test)/2) AS sort_val, 0, id FROM test WHERE reference IS NULL

    UNION ALL

    -- Increase the sort value by level for each referencing row
    SELECT test.id, test.data, test.reference, link.sort_val + (level + 1) AS sort_val, level + 1, link.tree_id
       FROM link
       JOIN test ON link.id = test.reference
)
-- sort value, level and tree id are printed here just to make it easier to understand how it works
SELECT id, data, reference, sort_val, level, tree_id
  FROM link
 ORDER BY sort_val;