Postgresql:使用ORDER BY RANDOM()组合distinct / group by

时间:2012-08-24 20:53:01

标签: database postgresql random

我有一张表如下

id| page |   text
------------------------
1 | page1 | Hello World
2 | page1 | Foo Bar
3 | page2 | Baz Baz
3 | page2 | Some Text
4 | page3 | Some Other Text

我想选择2个随机条目 - 但每个页面只允许在结果中出现一次。

我有

SELECT * FROM mydata ORDER BY RANDOM(); LIMIT 2

但我可以将其与DISTINCT或分组结合使用吗?

4 个答案:

答案 0 :(得分:2)

类似的东西:

select id, page, text
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from mydata
) 
where rn <= 2

答案 1 :(得分:1)

如果你想:
...从基表中总共两行
...并且每个页面都有相同的机会出现在样本中,无论它在表格中有多少条目:

SELECT *
FROM  (
    SELECT DISTINCT ON (page) *
    FROM   mydata
    ORDER  BY page, random() -- pick one random entry per page
    ) x
ORDER BY random() -- pick two random pages
LIMIT 2;

或者,使用窗口功能:

WITH x AS (
   SELECT *, row_number() OVER (PARTITION BY page ORDER BY random()) AS rn
   FROM   mydata
   )
SELECT id, page, text
FROM   x
WHERE  rn = 1
ORDER  BY random()
LIMIT  2;

你必须测试哪个更快 如果您正在处理大桌子并需要快速性能,那么您可以做得更好。 Here is one way how.


另一方面,如果你想要: ......表mydata中共有两行
...并在示例中显示每个条目(几乎)相等的机会 a,从而有效地为表格中包含更多条目的页面提供更好的机会。 /> 机会仍然不是真正平等 - 根据定义,您的限制会增加罕见页面输入的机会。

WITH x AS (
   SELECT *
   FROM   mydata
   ORDER  BY random()
   LIMIT 1
   )
SELECT * FROM x
UNION ALL
(
SELECT m.*
FROM mydata m
   , x
WHERE m.page <> x.page -- assuming page IS NOT NULL
ORDER BY random()
LIMIT 1
);

SELECT的第二UNION周围的括号必须允许个人订购。
使用PostgreSQL 9.1进行测试。窗口函数需要8.4或更高版本。

答案 2 :(得分:1)

与Erwin的回答相同,只是有点结构化:http://www.sqlfiddle.com/#!1/d3e83/6

with first_random as
(
  select * from tbl order by random() limit 1
)
, second_random as
(
  select * 
  from tbl 
  where page <> (select page from first_random)
  order by random() limit 1
)
select * from first_random
union
select * from second_random;

与a_horse_with_no_name的答案相同,但这是正确的:http://www.sqlfiddle.com/#!1/d3e83/12

select id, page, text, rn
from (
  select id, page, text,
         row_number() over (partition by page order by random()) as rn
  from tbl
) x
where rn = 1
order by random() 
limit 2;

选择后者,它有更简单的执行计划

答案 3 :(得分:0)

这可能有用:

SELECT * FROM
  (SELECT * FROM mydata GROUP BY page) t
ORDER BY RANDOM() LIMIT 2