我最近asked a question有人向我提供了一个有效的解决方案,但我忘了提到我的桌子有数百万行(项目表约为1000万,其他表约为1万)他们认为我正在使用我提供的示例中的小数据集。
这是SQL:
WITH a AS (
SELECT item.id, string_agg(prefered_store.store::varchar, ',') wishlist_stores
FROM item, list_wishlist, wishlist, prefered_store
WHERE item.list=list_wishlist.list
AND list_wishlist.wishlist=wishlist.id
AND wishlist.prefered_stores=prefered_store.id
GROUP BY item.id
), b AS (
SELECT item.id,
string_agg(
prefered_store.store::varchar || ',' || prefered_store.comment,
' ; ') item_stores_comments
FROM item, prefered_store
WHERE item.prefered_stores=prefered_store.id
GROUP BY item.id
)
SELECT a.id,item_stores_comments,wishlist_stores
FROM a,b
WHERE a.id=b.id
虽然它完全符合我的需要,但速度非常慢。仅限一行约10分钟,10行约15分钟。我还在等着看一千行需要多长时间(几乎是一小时)。现在我的桌面不是最快的:Pentium 4有1.5GB内存但是感觉还是不对。
我已将WHERE子句中的所有字段编入索引,并在需要时创建主键。 除此之外还有什么方法可以让这个查询运行得更快?
PostgreSQL 9.2
DDL:https://docs.google.com/file/d/0BwiyuwRCaqkCM09LVkJ4YlVNLWM/edit
仅包含相关表格和字段的简单图表:
EXPLAIN ANALYZE:
Merge Join (cost=23342752.95..12971604557.95 rows=863210883998 width=68) (actual time=1182616.544..1251542.167 rows=13139337 loops=1)
Merge Cond: (a.id = b.id)
CTE a
-> GroupAggregate (cost=8477658.65..8992463.86 rows=13139337 width=8) (actual time=252170.500..307061.559 rows=13139337 loops=1)
-> Sort (cost=8477658.65..8547771.35 rows=28045080 width=8) (actual time=252170.391..282495.516 rows=14870222 loops=1)
Sort Key: public.item.id
Sort Method: external merge Disk: 261528kB
-> Merge Join (cost=3010452.34..3474579.76 rows=28045080 width=8) (actual time=138444.102..210768.838 rows=14870222 loops=1)
Merge Cond: (list_wishlist.list = public.item.list)
-> Sort (cost=689954.53..695268.01 rows=2125390 width=8) (actual time=30482.447..55193.049 rows=1286901 loops=1)
Sort Key: list_wishlist.list
Sort Method: external merge Disk: 22624kB
-> Hash Join (cost=66643.55..408462.52 rows=2125390 width=8) (actual time=10417.244..26147.517 rows=1286901 loops=1)
Hash Cond: (wishlist.prefered_stores = public.prefered_store.id)
-> Hash Join (cost=38565.70..96225.43 rows=1226863 width=8) (actual time=8188.097..19815.024 rows=1226863 loops=1)
Hash Cond: (list_wishlist.wishlist = wishlist.id)
-> Seq Scan on list_wishlist (cost=0.00..22266.63 rows=1226863 width=8) (actual time=12.786..7467.442 rows=1226863 loops=1)
-> Hash (cost=20352.20..20352.20 rows=1110120 width=8) (actual time=7314.531..7314.531 rows=1110087 loops=1)
Buckets: 4096 Batches: 64 Memory Usage: 689kB
-> Seq Scan on wishlist (cost=0.00..20352.20 rows=1110120 width=8) (actual time=7.621..6572.731 rows=1110087 loops=1)
-> Hash (cost=14027.49..14027.49 rows=856349 width=8) (actual time=2159.339..2159.339 rows=856349 loops=1)
Buckets: 4096 Batches: 64 Memory Usage: 536kB
-> Seq Scan on prefered_store (cost=0.00..14027.49 rows=856349 width=8) (actual time=0.071..1602.971 rows=856349 loops=1)
-> Materialize (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=107961.603..149020.809 rows=14870219 loops=1)
-> Sort (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=107961.575..145971.848 rows=13139337 loops=1)
Sort Key: public.item.list
Sort Method: external merge Disk: 231088kB
-> Seq Scan on item (cost=0.00..228006.37 rows=13139337 width=8) (actual time=27.636..47661.750 rows=13139337 loops=1)
CTE b
-> GroupAggregate (cost=7166704.38..7843349.46 rows=13139337 width=12) (actual time=524258.000..794537.585 rows=13139337 loops=1)
-> Sort (cost=7166704.38..7223638.09 rows=22773483 width=12) (actual time=524257.908..755765.703 rows=13858612 loops=1)
Sort Key: public.item.id
Sort Method: external merge Disk: 297912kB
-> Merge Join (cost=2448353.26..2826901.79 rows=22773483 width=12) (actual time=201205.036..425873.108 rows=13858612 loops=1)
Merge Cond: (public.prefered_store.id = public.item.prefered_stores)
-> Sort (cost=127685.43..129826.31 rows=856349 width=12) (actual time=4545.447..12507.054 rows=856346 loops=1)
Sort Key: public.prefered_store.id
Sort Method: external merge Disk: 18408kB
-> Seq Scan on prefered_store (cost=0.00..14027.49 rows=856349 width=12) (actual time=0.060..2707.353 rows=856349 loops=1)
-> Materialize (cost=2320484.45..2386181.13 rows=13139337 width=8) (actual time=196659.554..406944.706 rows=13858611 loops=1)
-> Sort (cost=2320484.45..2353332.79 rows=13139337 width=8) (actual time=196659.535..396917.629 rows=13139337 loops=1)
Sort Key: public.item.prefered_stores
Sort Method: external merge Disk: 231096kB
-> Seq Scan on item (cost=0.00..228006.37 rows=13139337 width=8) (actual time=0.032..54885.583 rows=13139337 loops=1)
-> Sort (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=344329.838..353118.692 rows=13139337 loops=1)
Sort Key: a.id
Sort Method: external sort Disk: 259792kB
-> CTE Scan on a (cost=0.00..262786.74 rows=13139337 width=36) (actual time=252170.512..320132.738 rows=13139337 loops=1)
-> Materialize (cost=3253469.82..3319166.50 rows=13139337 width=36) (actual time=838286.670..888495.578 rows=13139337 loops=1)
-> Sort (cost=3253469.82..3286318.16 rows=13139337 width=36) (actual time=838286.652..886198.912 rows=13139337 loops=1)
Sort Key: b.id
Sort Method: external sort Disk: 385320kB
-> CTE Scan on b (cost=0.00..262786.74 rows=13139337 width=36) (actual time=524258.017..811717.462 rows=13139337 loops=1)
Total runtime: 1253101.865 ms
答案 0 :(得分:4)
请参阅PostgreSQL中的CTEs作为优化范围。因此,无论您将哪个谓词添加到查询的外部,PostgreSQL 将执行WITH()
子句的完整遍历。所以要优化你需要摆脱CTE。
这很简单,因为prefered_store
表是非规范化的。
请尝试此查询(也在SQL Fiddle上):
SELECT i.id item,
(SELECT string_agg(store||','||comment, ';')
FROM prefered_store WHERE id=i.prefered_stores) item_stores_comments,
string_agg(ps.store::text, ',') whishlist_stores
FROM item i
JOIN list_wishlist lw ON lw.list=i.list
JOIN wishlist w ON w.id=lw.wishlist
JOIN prefered_store ps ON ps.id=w.prefered_stores
GROUP BY i.id;
但我建议您查看架构设计。
答案 1 :(得分:1)
在SQL中构建逗号分隔值的列几乎总是错误的方法。最好返回数据行,让应用程序代码处理显示格式。
通过消除string_agg()函数和GROUP BY子句进行测试。
with a as (
select item.id,
prefered_store.store wishlist_stores
from item
inner join list_wishlist on item.list=list_wishlist.list
inner join wishlist on list_wishlist.wishlist=wishlist.id
inner join prefered_store on wishlist.prefered_stores=prefered_store.id
), b as (
select item.id,
prefered_store.store,
prefered_store.comment item_stores_comments
from item
inner join prefered_store on item.prefered_stores=prefered_store.id
)
select * from a
inner join b on a.id = b.id
你发布的SQL Fiddle用处不大。它没有主键,没有辅助索引,也没有足够的行来避免顺序扫描。
答案 2 :(得分:0)
因为您使用的是“with”?
为什么不使用“UNION”?
如果没有“ForeignKey”空表,则可以使用“LEFT JOIN”,但这是一个快速比较。
如果您发布图表我可以重做SQL:D