Question

我有以下查询：

SELECT a.id, a.col2, b.id, b.col2, c.id, c.col2
FROM a 
  JOIN b on b.fk_a_id = a.id 
  JOIN c on c.fk_a_id = a.id
  INNER JOIN d on d.fk_c_id = c.id 
WHERE a.owner_id NOT IN (1, 3, 100, 41)
GROUP BY a.id, b.id, c.id 
ORDER BY a.created_date desc
LIMIT __ OFFSET __

索引： a.id，a.owner_id，b.id，c.id

但是，此查询中没有使用任何索引。我有另一个类似的查询与一个额外的表连接，正如我所期望的那样使用索引。关于为什么这个查询没有使用索引的任何想法？

编辑以包含说明：

"Limit  (cost=7.88..7.89 rows=4 width=243) (actual time=175.824..175.825 rows=10 loops=1)" 
"  ->  Sort  (cost=7.88..7.89 rows=4 width=243) (actual time=175.822..175.822 rows=10 loops=1)" 
"        Sort Key: a.created_date DESC" 
"        Sort Method: quicksort  Memory: 27kB" 
"        ->  HashAggregate  (cost=7.78..7.84 rows=4 width=243) (actual time=175.771..175.778 rows=10 loops=1)" 
"              Group Key: a.id, b.id, c.id" 
"              ->  Hash Join  (cost=5.12..7.75 rows=4 width=243) (actual time=0.072..0.099 rows=20 loops=1)" 
"                    Hash Cond: (a.id = b.fk_a_id)" 
"                    ->  Hash Join  (cost=2.85..5.43 rows=4 width=163) (actual time=0.041..0.063 rows=20 loops=1)" 
"                          Hash Cond: (a.id = d.fk_a_id)" 
"                          ->  Seq Scan on table a  (cost=0.00..2.44 rows=27 width=126) (actual time=0.008..0.025 rows=28 loops=1)" 
"                                Filter: (owner_id <> ALL ('{1,3,100,41}'::bigint[]))" 
"                                Rows Removed by Filter: 1" 
"                          ->  Hash  (cost=2.76..2.76 rows=7 width=53) (actual time=0.027..0.027 rows=3 loops=1)" 
"                                Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                                ->  Hash Join  (cost=1.16..2.76 rows=7 width=53) (actual time=0.019..0.023 rows=3 loops=1)" 
"                                      Hash Cond: (c.id = d.fk_c_id)" 
"                                      ->  Seq Scan on table c  (cost=0.00..1.39 rows=39 width=45) (actual time=0.002..0.004 rows=39 loops=1)" 
"                                      ->  Hash  (cost=1.07..1.07 rows=7 width=8) (actual time=0.007..0.007 rows=3 loops=1)" 
"                                            Buckets: 1024  Batches: 1  Memory Usage: 9kB"
"                                            ->  Seq Scan on table d  (cost=0.00..1.07 rows=7 width=8) (actual time=0.003..0.004 rows=3 loops=1)" 
"                    ->  Hash  (cost=2.12..2.12 rows=12 width=88) (actual time=0.022..0.022 rows=12 loops=1)" 
"                          Buckets: 1024  Batches: 1  Memory Usage: 9kB" 
"                          ->  Seq Scan on table b  (cost=0.00..2.12 rows=12 width=88) (actual time=0.005..0.013 rows=12 loops=1)"
"Planning time: 210.946 ms"
"Execution time: 175.987 ms"

Answer 1

表A中有一个标准：WHERE a.owner_id NOT IN (1, 3, 100, 41)。听起来像＃34;选择除了少数＆＃34;之外的所有记录。通过索引读取所有记录并进行大部分工作仍然需要做很多工作。只需简单地阅读表格并快速解除一些记录即可。

然后，通过那些许多A记录，我们可以匹配许多，多个B，C和D记录。同样，我们最好不要读取索引加上大部分表数据，而只是将数据放入桶中（散列连接）。这似乎最快。

因此优化器选择不为查询使用索引似乎是个好主意。它证明它做得很好： - ）

我认为使用索引加快速度的唯一方法是覆盖索引，即包含所有所需列的索引：

a（owner_id，id，created_date，col2）
b（fk_a_id，id，col2）
c（fk_a_id，id，col2）
d（fk_c_id）

然后我们不必阅读索引和表格，而只阅读索引。

顺便说一句：由于你没有从D中选择任何内容，你要么想要检查是否存在，那么在我看来你应该使用EXISTS或IN而不是联接来提高可读性。或者你没有;然后你可以完全从你的查询中解除它。

Answer 2

索引用于（快速）在“多行”中找到“几个”行。它不是一个神奇的银弹，让一切都变得更快。

表中没有足够的行来提高索引查找效率。你几乎得到了所有这些，而不只是一小部分。

如果你看一下计划，你会发现实际的数据检索永远不会超过0.1毫秒。（那是毫秒的十分之一。表c和d的Seq扫描只需要0.004毫秒 - 没有索引只能加速4行。

通过具有随机I / O的索引仅执行20或30行肯定会更慢。根据表中的列数，即使“最大”表的39行可能存储在一个块上 - 这意味着要读取所有行，数据库在使用Seq时只需要执行一次I / O操作扫描。

计划中最慢的部分是数据的HashAggregate 读取，因此不使用索引的选择似乎是正确的。

那是什么类型的硬件？汇总20行的175毫秒似乎非常慢 - 规划时间为210毫秒。对于这样一个简单的陈述，计划时间应该更像是1ms。

Postgres SQL Query忽略索引？

2 个答案: