我有Foo和Buzz表,如下所示:
Foos
buzz_id
date
Foo在外键buzz_id上有一个索引。它还具有日期索引。
Buzzes
name
group
Buzz具有名称索引,组索引和组合名称和组的多列唯一索引。嗡嗡声有很多Foo。
我正在进行以下查询,这花费了太多时间:
SELECT DISTINCT ON (foos.buzz_id) foos.id, foos.date, buzzes.name, buzzes.group FROM foos INNER JOIN buzzes ON buzzes.id = foos.buzz_id
WHERE (buzzes.group = ANY (ARRAY_OF_GROUPS)
AND buzzes.name = ANY (ARRAY_OF_NAMES)
AND foos.date <= GIVEN_DATE) ORDER BY foos.buzz_id DESC, foos.date DESC;
我要在外键上连接两个表,并尝试获取每个buzz_id的foo的最高日期(允许在给定的数组中包含嗡嗡声的名称和组,并且满足日期条件)。
我有两个运行查询的环境,即本地计算机和Heroku环境。如您所见,我的本地环境中的表较小:
Local:
foos | r | 4.013e+06 | 639 MB
foos_pkey | i | 4.19832e+06 | 198 MB
index_foos_on_buzz_id | i | 4.19832e+06 | 285 MB
index_foos_on_date | i | 4.19832e+06 | 330 MB
buzzes | r | 2298 | 184 kB
index_buzzes_on_name_and_group | i | 2298 | 120 kB
index_buzzes_on_group | i | 2298 | 104 kB
index_buzzes_on_name | i | 2298 | 88 kB
Heroku:
foos | r | 4.92772e+07 | 6653 MB
foos_pkey | i | 4.90556e+07 | 3151 MB
index_foos_on_buzz_id | i | 4.90556e+07 | 2462 MB
index_foos_on_date | i | 4.90556e+07 | 2421 MB
buzzes | r | 328250 | 24 MB
index_buzzes_on_name_and_group | i | 328250 | 10200 kB
index_buzzes_on_group | i | 328250 | 8624 kB
index_buzzes_on_name | i | 328250 | 7224 kB
我的本地表缺少数据,因此查询返回的行少于Heroku环境。
我在ARRAY_OF_NAMES中有很多商品,在这种情况下为500,而在ARRAY_OF_GROUPS中则相对较少,比如说4。
我的Heroku环境没有足够的RAM来将我的所有数据保存在缓存中,因此我知道目前查询的速度比所有缓存都慢。
使用EXPLAIN ANALYZE运行查询将为我提供以下输出:
Local:
Unique (cost=330087.91..336514.17 rows=1485 width=46) (actual time=3602.511..4131.322 rows=736 loops=1)
-> Sort (cost=330087.91..333301.04 rows=1285252 width=46) (actual time=3602.509..4003.598 rows=1404653 loops=1)
Sort Key: foos.buzz_id DESC, foos.date DESC
Sort Method: external merge Disk: 96096kB
-> Hash Join (cost=311.50..160136.33 rows=1285252 width=46) (actual time=10.815..1438.885 rows=1404653 loops=1)
Hash Cond: (foos.buzz_id = buzz.id)
-> Seq Scan on foos (cost=0.00..131923.55 rows=4013004 width=32) (actual time=1.728..925.871 rows=4186572 loops=1)
Filter: (date <= GIVEN_DATE)
-> Hash (cost=301.48..301.48 rows=801 width=18) (actual time=9.035..9.035 rows=736 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 47kB
-> Index Scan using index_buzzes_on_name on buzzes (cost=0.28..301.48 rows=801 width=18) (actual time=0.057..8.189 rows=736 loops=1)
Index Cond: ((name)::text = ANY (ARRAY_OF_NAMES::text[]))
Filter: ((group)::text = ANY (ARRAY_OF_GROUPS::text[]))
Rows Removed by Filter: 5
Planning time: 5.804 ms
Execution time: 4151.021 ms
(16 rows)
Heroku:
Unique (cost=1086348.46..1086579.42 rows=17073 width=44) (actual time=64428.256..64980.542 rows=1467 loops=1)
-> Sort (cost=1086348.46..1086463.94 rows=230962 width=44) (actual time=64428.254..64801.540 rows=1889788 loops=1)
Sort Key: foos.buzz_id DESC, foos.date DESC
Sort Method: external merge Disk: 129240kB
-> Gather (cost=3860.54..1082233.34 rows=230962 width=44) (actual time=20.290..61998.689 rows=1889788 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Hash Join (cost=2860.54..1058137.14 rows=135860 width=44) (actual time=16.240..61607.831 rows=944894 loops=2)
Hash Cond: (foos.buzz_id = buzz.id)
-> Parallel Seq Scan on foos (cost=0.00..953099.09 rows=28986598 width=32) (actual time=0.312..59310.999 rows=24527783 loops=2)
Filter: (date <= GIVEN_DATE)
-> Hash (cost=2855.15..2855.15 rows=1539 width=16) (actual time=9.022..9.022 rows=1467 loops=2)
Buckets: 2048 Batches: 1 Memory Usage: 90kB
-> Bitmap Heap Scan on buzzes (cost=973.85..2855.15 rows=1539 width=16) (actual time=6.534..8.753 rows=1467 loops=2)
Recheck Cond: (((group)::text = ANY (ARRAY_OF_GROUPS::text[])) AND ((name)::text = ANY (ARRAY_OF_NAMES::text[])))
Heap Blocks: exact=902
-> BitmapAnd (cost=973.85..973.85 rows=1539 width=0) (actual time=6.416..6.416 rows=0 loops=2)
-> Bitmap Index Scan on index_buzzes_on_group (cost=0.00..87.51 rows=10174 width=0) (actual time=1.059..1.059 rows=10504 loops=2)
Index Cond: ((group)::text = ANY (ARRAY_OF_GROUPS::text[]))
-> Bitmap Index Scan on index_buzzes_on_name (cost=0.00..886.14 rows=49668 width=0) (actual time=5.168..5.168 rows=50042 loops=2)
Index Cond: ((name)::text = ANY (ARRAY_OF_NAMES::text[]))
Planning time: 1.993 ms
Execution time: 64999.534 ms
您对加快查询速度有任何建议吗?或者这是预期的行为?我想我一定做错了,因为我衷心怀疑我是否将postgres推到了性能的边缘。
答案 0 :(得分:0)
如果您只需要从foo
获取最长日期,则汇总查询会更合适。它还可能会更快,值得一试:
select
b.id,
b.name,
b."group",
max(f.date)
from buzz b,
foo f
where b.id = f.buzz_id
and b."group" in [ARRAY]
and b.name in [ARRAY]
and f.date < NOW()
group by b.id;
关于索引:如果这是唯一要在该数据库上运行的查询,则单列索引(名称和组)将无济于事,因为您的条件在两个地方都使用了它们。如果没有其他查询仅基于这些列之一进行连接或过滤,则可以删除这两个索引以加快插入/更新的速度。
答案 1 :(得分:0)
也将WHERE条件移动到INNER JOIN。这样,您的数据集将尽早减少。理想情况下,您可能需要执行以下操作以尽快减少数据(并且一定要避免交叉连接):
IndSale` DECIMAL(10,2) GENERATED ALWAYS AS TotalSale/Quantity