我们有一个视图,它在一个包含5400万条记录的表上执行 string_agg ,其中string_agg需要处理每组2个字符串。使用如下查询的视图定义导致postgresql服务器进程因内存不足而死亡。
SELECT id, string_agg(msg, “,”) FROM msgs GROUP BY id
当视图查询被修改如下时,它正常工作。
SELECT id, string_agg(msg, ',') as msgs
FROM ( SELECT id, msg, row_number() over (partition by id) as row_num
FROM msgs) as limited_alerts
WHERE row_num < 5
GROUP BY id
是什么原因?是因为postgres能够在第二种情况下使用临时文件吗?是否有任何链接/文章解释详细信息?
总行数:5400万
每个id的最大消息数:2
看起来崩溃的是使用HashAggregate和其他一个GroupAggregate
explain select count(*) from (SELECT msgs.id, string_agg(msgs.alerts, ','::text) AS alerts FROM msgs GROUP BY msgs.id) as a;
QUERY PLAN
------------------------------------------------------------------------------------------------------
Aggregate (cost=3032602.58..3032602.59 rows=1 width=0)
-> HashAggregate (cost=3032597.58..3032600.08 rows=200 width=108)
Group Key: msgs.id
-> Append (cost=0.00..2587883.05 rows=88942906 width=108)
-> Seq Scan on msgs (cost=0.00..0.00 rows=1 width=548)
-> Seq Scan on msgs_2016_01_24 (cost=0.00..506305.40 rows=17394640 width=107)
-> Seq Scan on msgs_2016_01_31 (cost=0.00..509979.80 rows=17512480 width=107)
-> Seq Scan on msgs_2016_02_07 (cost=0.00..491910.32 rows=16883332 width=108)
-> Seq Scan on msgs_2016_02_14 (cost=0.00..496443.84 rows=17071384 width=108)
-> Seq Scan on msgs_2016_02_21 (cost=0.00..552162.84 rows=19026084 width=108)
-> Seq Scan on msgs_2016_02_28 (cost=0.00..31038.05 rows=1054705 width=111)
-> Seq Scan on msgs_2016_03_06 (cost=0.00..10.70 rows=70 width=548)
-> Seq Scan on msgs_2016_03_13 (cost=0.00..10.70 rows=70 width=548)
-> Seq Scan on msgs_2016_03_20 (cost=0.00..10.70 rows=70 width=548)
-> Seq Scan on msgs_2016_03_27 (cost=0.00..10.70 rows=70 width=548)