在PostgreSQL中,与计算值的比较失败

时间:2015-03-11 11:01:18

标签: sql postgresql

我想从一个大表中随机选择20行,并使用以下查询工作正常:

SELECT id
FROM timeseriesentry
WHERE random() < 20*1.0/12940622

(12940622是表中的行数)。我现在想要自动检索行数并使用

WITH tmp AS (SELECT COUNT(*) n FROM timeseriesentry)
SELECT id
FROM timeseriesentry, tmp
WHERE random() < 20*1.0/n

即使n是正确的,也会产生零行。

我在这里缺少什么?

编辑:id不是数字,这就是为什么我无法创建随机系列来从中进行选择的原因。我需要建议的结构,因为我的实际目标是

WITH npt AS (
    SELECT type, COUNT(*) n
    FROM timeseriesentry
    GROUP BY type
)
SELECT v.id
FROM timeseriesentry v
JOIN npt ON npt.type= v.type
WHERE random() < 200*1.0/npt.n

每种类型的样本量大致相同。

3 个答案:

答案 0 :(得分:1)

这很难看,但确实有效。它还避免使用标识符type,这是一个(未保留的)关键字。

WITH zzz AS (
        SELECT ztype
        , COUNT(*) AS cnt
         FROM timeseriesentry
        GROUP BY ztype)
SELECT *
FROM timeseriesentry src
WHERE  random() < 20.0 / (SELECT cnt FROM zzz
                        WHERE zzz.ztype = src.ztype)
ORDER BY src.ztype
        ;

更新:与子查询中的窗口函数相同:

SELECT *
FROM    (SELECT *
        , sum(1) OVER (PARTITION BY ztype) AS cnt
        FROM timeseriesentry
        ) src
WHERE random() < 20.0 / src.cnt
ORDER BY src.ztype
        ;

或者,更紧凑,同样的事情,但使用CTE:

WITH src AS(SELECT *
        , sum(1) OVER (PARTITION BY ztype) AS cnt
        FROM timeseriesentry
        ) 
SELECT *
FROM src
WHERE random() < 20.0 / src.cnt
ORDER BY src.ztype
 ;

注意:CTE版本的性能不一定相同。事实上,他们往往更慢。 (因为在任何一种情况下,OQ实际上都需要访问所有 timeseriesentry表的行,所以在这种特殊情况下不会有太大差异)

答案 1 :(得分:1)

我创建了一个没有数字字段的表:

create table timeseriesentry as select generate_series('2015-01-01'::timestamptz,'2015-01-02'::timestamptz,'1 second'::interval) id, 'ret'::text v
;

并重复使用窗口聚合:

WITH tmp AS (SELECT round(count(*) over()*random()) n FROM timeseriesentry limit 20)
select id from 
(SELECT row_number() over() rn,id
FROM timeseriesentry
) sel, tmp
WHERE rn =n
;

所以它给了&#34;随机&#34; 20:

2015-01-01 01:27:22+01
2015-01-01 03:33:51+01
2015-01-01 06:15:28+01
2015-01-01 09:52:21+01
2015-01-01 10:00:02+01
2015-01-01 10:08:33+01
2015-01-01 10:26:31+01
2015-01-01 12:55:21+01
2015-01-01 14:03:54+01
2015-01-01 14:05:36+01
2015-01-01 15:12:08+01
2015-01-01 15:45:55+01
2015-01-01 16:10:35+01
2015-01-01 17:11:02+01
2015-01-01 18:18:32+01
2015-01-01 19:35:51+01
2015-01-01 22:06:08+01
2015-01-01 22:12:42+01
2015-01-01 22:43:45+01
2015-01-01 22:49:55+01

答案 2 :(得分:0)

我猜我最接近的是:

WITH tmp AS (SELECT round(count(*) over()*random()) n FROM timeseriesentry limit 20)
SELECT id
FROM timeseriesentry, tmp
WHERE id=n