我想查询一个表,并为该月最后一天的所有行汇总一列。
让我们使用下表作为示例:
CREATE TABLE example(dt date, value int, other1 int, other2 int, other3 int);
CREATE INDEX ON example (as_of);
我的查询如下:
SELECT dt, SUM(value)
FROM example
WHERE dt in (select date_trunc('month', d) + interval '1 month - 1 day'
from generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d))
GROUP BY dt
如果我查看查询计划,我会看到它正在对表执行顺序扫描:
EXPLAIN ANALYSE SELECT dt, SUM(value)
FROM example
WHERE dt in (select date_trunc('month', d) + interval '1 month - 1 day'
from generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d))
GROUP BY dt
GroupAggregate (cost=825385.12..871490.30 rows=1536 width=12) (actual time=4323.887..6141.401 rows=56 loops=1)
Group Key: example.Dt
-> Merge Join (cost=825385.12..863846.28 rows=1525732 width=12) (actual time=4323.811..6118.514 rows=101102 loops=1)
Merge Cond: (example.dt = ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval)))
-> Sort (cost=825312.64..832941.30 rows=3051464 width=12) (actual time=4323.585..5303.902 rows=3051464 loops=1)
Sort Key: example.dt
Sort Method: external merge Disk: 77512kB
-> Seq Scan on example (cost=0.00..392353.64 rows=3051464 width=12) (actual time=10.385..1748.592 rows=3051464 loops=1)
-> Sort (cost=72.48..72.98 rows=200 width=8) (actual time=0.168..18.248 rows=101105 loops=1)
Sort Key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))
Sort Method: quicksort Memory: 27kB
-> Unique (cost=59.84..64.84 rows=200 width=8) (actual time=0.108..0.143 rows=59 loops=1)
-> Sort (cost=59.84..62.34 rows=1000 width=8) (actual time=0.106..0.112 rows=59 loops=1)
Sort Key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))
Sort Method: quicksort Memory: 27kB
-> Function Scan on generate_series dates (cost=0.01..10.01 rows=1000 width=8) (actual time=0.042..0.097 rows=59 loops=1)
但是,如果我向查询添加其他SUM,则它决定使用dt
上的索引:
EXPLAIN ANALYSE SELECT dt, SUM(value), SUM(other1), SUM(other2), SUM(other3)
FROM example
WHERE dt in (select date_trunc('month', d) + interval '1 month - 1 day'
from generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d))
GROUP BY dt
HashAggregate (cost=1005765.17..1005780.53 rows=1536 width=61) (actual time=225.249..225.276 rows=56 loops=1)
Group Key: l.as_of
-> Nested Loop (cost=60.27..975250.53 rows=1525732 width=61) (actual time=0.141..173.853 rows=101102 loops=1)
-> Unique (cost=59.84..64.84 rows=200 width=8) (actual time=0.100..0.192 rows=59 loops=1)
-> Sort (cost=59.84..62.34 rows=1000 width=8) (actual time=0.099..0.125 rows=59 loops=1)
Sort Key: ((date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))
Sort Method: quicksort Memory: 27kB
-> Function Scan on generate_series dates (cost=0.01..10.01 rows=1000 width=8) (actual time=0.031..0.080 rows=59 loops=1)
-> Index Scan using dashboard_loanhistory_95daa586 on dashboard_loanhistory l (cost=0.43..4856.06 rows=1987 width=61) (actual time=0.025..1.579 rows=1714 loops=59)
Index Cond: (as_of = (date_trunc('month'::text, dates.d) + '1 mon -1 days'::interval))
Planning time: 0.228 ms
Execution time: 225.379 ms
这里发生了什么?我希望使用dt
上的索引运行原始查询,我不希望不必要地向查询添加其他聚合。
答案 0 :(得分:0)
这是基于该问题的评论,特别是@joops答案。这有点像黑客,因为它需要另一个索引 - 我真的不明白为什么查询规划器不会在dt
这里使用它但是这个工作¯\ _(ツ)_ /¯
我在dt
列上添加了部分索引,将其限制为可能是一个月中最后一天的那些天:
CREATE INDEX ON example (dt) WHERE date_part('day', dt) IN (28, 29, 30, 31);
然后我改变了我的查询,以在该月的当天包含一个谓词:
SELECT dt, SUM(value)
FROM example
WHERE date_part('day', dt) IN (28, 29, 30, 31) AND
dt IN (select date_trunc('month', d) + interval '1 month - 1 day' from generate_series('2012-01-01'::date, '2016-11-10'::date, interval '1 month') dates(d));