下面的查询将first
的结果分组为4个等间隔的日期分箱,并汇总每个分箱中the_value
的平均值。
WITH first as(
SELECT
extract(EPOCH FROM foo.t_date) as the_date,
foo_val as the_value
FROM bar
INNER JOIN foo
ON
foo.user_id = bar.x_id
and
foo.user_name = 'xxxx'
)
SELECT bin, round(sum(bin_sum) OVER w /sum(bin_ct) OVER w, 2) AS running_avg
FROM (
SELECT width_bucket(first.the_date
, x.min_epoch, x.max_epoch, x.bins) AS bin
, sum(first.the_value) AS bin_sum
, count(*) AS bin_ct
FROM first
, (SELECT MIN(first.the_date) AS min_epoch
, MAX(first.the_date) AS max_epoch
, 4 AS bins
FROM first
) x
GROUP BY 1
) sub
WINDOW w AS (ORDER BY bin)
ORDER BY 1;
我希望能够只计算每个箱子中最低20 the_value
的平均值。从Stackoverflow上的其他帖子我已经看到这是可能的,也许ORDER BY the_value
和rank()
是最好的方法。但我的困难在于我不确定应该修改当前查询的位置以实现此目的。
任何见解都将受到赞赏。
Postgres版本9.3
答案 0 :(得分:1)
在每个垃圾箱上使用row_number()
首先计算行号rn
,然后在下一步中应用WHERE rn < 21
:
WITH first AS (
SELECT extract(EPOCH FROM foo.t_date) AS the_date
, foo_val AS the_value
FROM bar
JOIN foo ON foo.user_id = bar.x_id
AND foo.user_name = 'xxxx'
)
, x AS (
SELECT MIN(the_date) AS min_epoch
, MAX(the_date) AS max_epoch
FROM first
)
, y AS (
SELECT width_bucket(f.the_date, x.min_epoch, x.max_epoch, 4) AS bin, *
FROM first f, x
)
, z AS (
SELECT row_number() OVER (PARTITION BY bin ORDER BY the_value) AS rn, *
FROM y
)
SELECT bin, round(sum(bin_sum) OVER w / sum(bin_ct) OVER w, 2) AS running_avg
FROM (
SELECT bin
, sum(the_value) AS bin_sum
, count(*) AS bin_ct
FROM z
WHERE rn < 21 -- max 20 lowest values
GROUP BY 1
) sub
WINDOW w AS (ORDER BY bin)
ORDER BY 1;
CTE y
和z
可能会混淆。同样,first
和x
也可以混为一谈
但这样更清楚。
未经测试,因为我们没有测试数据。