我有一个具有此架构的表:
create table mytable (creation_date timestamp,
value int,
category int);
我希望每个类别每小时最大值出现一次,仅在工作日。我已经取得了一些进展,我现在有这样的查询:
select category,foo.h as h,value, count(value) from mytable, (
select date_trunc('hour',
'2000-01-01 00:00:00'::timestamp+generate_series(0,23)*'1 hour'::interval)::time as h) AS foo
where date_part('hour',creation_date) = date_part('hour',foo.h) and
date_part('dow',creation_date) > 0 and date_part('dow',creation_date) < 6
group by category,h,value;
结果我得到了这样的东西:
category | h | value | count
---------+----------+---------+-------
1 | 00:00:00 | 2 | 1
1 | 01:00:00 | 2 | 1
1 | 02:00:00 | 2 | 6
1 | 03:00:00 | 2 | 31
1 | 03:00:00 | 3 | 11
1 | 04:00:00 | 2 | 21
1 | 04:00:00 | 3 | 9
1 | 13:00:00 | 1 | 14
1 | 14:00:00 | 1 | 10
1 | 14:00:00 | 2 | 7
1 | 15:00:00 | 1 | 52
例如在04:00我必须得到值2和3,分别为21和9,我只需要具有最高计数值的值,这将是统计模式。
BTW我有超过2M的记录
答案 0 :(得分:2)
这可以更简单:
SELECT DISTINCT ON (category, extract(hour FROM creation_date)::int)
category
, extract(hour FROM creation_date)::int AS h
, count(*)::int AS max_ct
, value
FROM mytable
WHERE extract(isodow FROM creation_date) < 6 -- no sat or sun
GROUP BY 1,2,4
ORDER BY 1,2,3 DESC;
WHERE ...
)。使用ISODOW来简化表达。hour
timestamp
提取h
。category
,h
和value
分组。integer
- 我们不需要bigint
。category
,h
和最高点数(DESC
)排序。(category, h)
选择category
的第一行(最高点数)。我能够在一个查询级别执行此操作,因为在聚合函数之后应用了DISTINCT
。
结果将为任何(category, h)
保留 no 行,而根本没有任何条目。如果您需要填写空白LEFT JOIN
,请执行以下操作:
SELECT c.category, h.h
FROM cat_tbl c
CROSS JOIN (SELECT generate_series(0, 23) AS h) h
答案 1 :(得分:1)
考虑到你的表的大小,我很想使用你的查询来构建一个临时表,然后对其进行查询以最终确定结果。
假设您调用了临时表“summary_table”,以下查询应该这样做。
select
category, h, value, count
from
summary_table s1
where
not exists
(select * from summary_table s2
where s1.category = s2.category and
s1.h = s2.h and
(s1.count < s2.count
OR (s1.count = s2.count and s1.value > s2.value));
如果您不想创建表,可以使用WITH子句将查询附加到此表。
with summary_table as (
select category,foo.h as h,value, count(value) as count from mytable, (
select date_trunc('hour',
'2000-01-01 00:00:00'::timestamp+generate_series(0,23)*'1 hour'::interval)::time as h) AS foo
where date_part('hour',creation_date) = date_part('hour',foo.h) and
date_part('dow',creation_date) > 0 and date_part('dow',creation_date) < 6
group by category,h,value)
select
category, h, value, count
from
summary_table s1
where
not exists
(select * from summary_table s2
where s1.category = s1.category and
s1.h = s2.h and
(s1.count < s2.count
OR (s1.count = s2.count and s1.value > s2.value));