我有一项任务:从每组(按时间)获取数据的第一个,最后一个,最大值,最小值。我的解决方案有效,但速度非常慢,因为表中的行数约为5000万。
如何改善此查询的效果:
SELECT
date_trunc('minute', t_ordered."timestamp"),
MIN (t_ordered.price),
MAX (t_ordered.price),
FIRST (t_ordered.price),
LAST (t_ordered.price)
FROM(
SELECT t.price, t."timestamp"
FROM trade t
WHERE t."timestamp" >= '2016-01-01' AND t."timestamp" < '2016-09-01'
ORDER BY t."timestamp" ASC
) t_ordered
GROUP BY 1
ORDER BY 1
FIRST和LAST是聚合函数from postgresql wiki
索引时间戳列。 解释(分析,详细):
GroupAggregate (cost=13112830.84..33300949.59 rows=351556 width=14) (actual time=229538.092..468212.450 rows=351138 loops=1)
Output: (date_trunc('minute'::text, t_ordered."timestamp")), min(t_ordered.price), max(t_ordered.price), first(t_ordered.price), last(t_ordered.price)
Group Key: (date_trunc('minute'::text, t_ordered."timestamp"))
-> Sort (cost=13112830.84..13211770.66 rows=39575930 width=14) (actual time=229515.281..242472.677 rows=38721704 loops=1)
Output: (date_trunc('minute'::text, t_ordered."timestamp")), t_ordered.price
Sort Key: (date_trunc('minute'::text, t_ordered."timestamp"))
Sort Method: external sort Disk: 932656kB
-> Subquery Scan on t_ordered (cost=6848734.55..7442373.50 rows=39575930 width=14) (actual time=102166.368..155540.492 rows=38721704 loops=1)
Output: date_trunc('minute'::text, t_ordered."timestamp"), t_ordered.price
-> Sort (cost=6848734.55..6947674.38 rows=39575930 width=14) (actual time=102165.836..130971.804 rows=38721704 loops=1)
Output: t.price, t."timestamp"
Sort Key: t."timestamp"
Sort Method: external merge Disk: 993480kB
-> Seq Scan on public.trade t (cost=0.00..1178277.21 rows=39575930 width=14) (actual time=0.055..25726.038 rows=38721704 loops=1)
Output: t.price, t."timestamp"
Filter: ((t."timestamp" >= '2016-01-01 00:00:00'::timestamp without time zone) AND (t."timestamp" < '2016-09-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 9666450
Planning time: 1.663 ms
Execution time: 468949.753 ms
也许它可以通过窗口函数完成?我已经尝试但是我没有足够的知识来使用它们
答案 0 :(得分:1)
创建类型和足够的聚合将有望更好地工作:
create type tp as (timestamp timestamp, price int);
create or replace function min_tp (tp, tp)
returns tp as $$
select least($1, $2);
$$ language sql immutable;
create aggregate min (tp) (
sfunc = min_tp,
stype = tp
);
min
和max
(未列出)聚合函数会将查询缩减为单个循环:
select
date_trunc('minute', timestamp) as minute,
min (price) as price_min,
max (price) as price_max,
(min ((timestamp, price)::tp)).price as first,
(max ((timestamp, price)::tp)).price as last
from t
where timestamp >= '2016-01-01' and timestamp < '2016-09-01'
group by 1
order by 1
解释(分析,详细):
GroupAggregate (cost=6954022.61..27159050.82 rows=287533 width=14) (actual time=129286.817..510119.582 rows=351138 loops=1)
Output: (date_trunc('minute'::text, "timestamp")), min(price), max(price), (min(ROW("timestamp", price)::tp)).price, (max(ROW("timestamp", price)::tp)).price
Group Key: (date_trunc('minute'::text, trade."timestamp"))
-> Sort (cost=6954022.61..7053049.25 rows=39610655 width=14) (actual time=129232.165..156277.718 rows=38721704 loops=1)
Output: (date_trunc('minute'::text, "timestamp")), price, "timestamp"
Sort Key: (date_trunc('minute'::text, trade."timestamp"))
Sort Method: external merge Disk: 1296392kB
-> Seq Scan on public.trade (cost=0.00..1278337.71 rows=39610655 width=14) (actual time=0.035..45335.947 rows=38721704 loops=1)
Output: date_trunc('minute'::text, "timestamp"), price, "timestamp"
Filter: ((trade."timestamp" >= '2016-01-01 00:00:00'::timestamp without time zone) AND (trade."timestamp" < '2016-09-01 00:00:00'::timestamp without time zone))
Rows Removed by Filter: 9708857
Planning time: 0.286 ms
Execution time: 510648.395 ms