为什么不使用PostgreSQL索引并建议索引

时间:2014-07-07 09:24:40

标签: sql ruby-on-rails postgresql indexing

我正在进行查询以选择位于某个地理区域的记录,我正在进行一些连接和几次过滤。

这是我的疑问:

SELECT "events".* FROM "events" INNER JOIN "albums" ON "albums"."event_id" = "events"."id" INNER JOIN "photos" ON "photos"."album_id" = "albums"."id" WHERE "events"."deleted_at" IS NULL AND "albums"."deleted_at" IS NULL AND "photos"."deleted_at" IS NULL AND (events.latitude BETWEEN -44.197088742316055 AND -23.22003941183816 AND events.longitude BETWEEN 133.226480859375 AND 165.570230859375) GROUP BY events.id HAVING count(albums.id) > 0  ORDER BY start_date DESC

我有以下索引:

活动:

"events_pkey" PRIMARY KEY, btree (id)
"index_events_on_deleted_at" btree (deleted_at)
"index_events_on_latitude_and_longitude" btree (latitude, longitude)

相册:

"albums_pkey" PRIMARY KEY, btree (id)
"index_albums_on_deleted_at" btree (deleted_at)
"index_albums_on_event_id" btree (event_id)

照片:

"photos_pkey" PRIMARY KEY, btree (id)
"index_photos_on_album_id" btree (album_id)
"index_photos_on_deleted_at" btree (deleted_at)

执行EXPLAIN ANALYZE会导致此问题,而且我没有看到我的索引的任何用法。我不知道如何强制它使用索引。任何人都可以帮我优化吗?

 Sort  (cost=4057.46..4057.84 rows=150 width=668) (actual time=556.114..556.187 rows=76 loops=1)
   Sort Key: events.start_date
   Sort Method: quicksort  Memory: 78kB
   ->  HashAggregate  (cost=4050.16..4052.04 rows=150 width=668) (actual time=555.667..555.783 rows=76 loops=1)
         Filter: (count(albums.id) > 0)
         ->  Hash Join  (cost=76.14..3946.54 rows=20724 width=668) (actual time=3.675..467.578 rows=48050 loops=1)
               Hash Cond: (photos.album_id = albums.id)
               ->  Seq Scan on photos  (cost=0.00..3441.87 rows=59013 width=4) (actual time=0.008..169.206 rows=60599 loops=1)
                     Filter: (deleted_at IS NULL)
               ->  Hash  (cost=74.10..74.10 rows=163 width=668) (actual time=3.633..3.633 rows=318 loops=1)
                     Buckets: 1024  Batches: 1  Memory Usage: 176kB
                     ->  Hash Join  (cost=49.80..74.10 rows=163 width=668) (actual time=1.195..2.519 rows=318 loops=1)
                           Hash Cond: (albums.event_id = events.id)
                           ->  Seq Scan on albums  (cost=0.00..21.47 rows=321 width=8) (actual time=0.011..0.458 rows=321 loops=1)
                                 Filter: (deleted_at IS NULL)
                           ->  Hash  (cost=47.92..47.92 rows=150 width=664) (actual time=1.151..1.151 rows=195 loops=1)
                                 Buckets: 1024  Batches: 1  Memory Usage: 126kB
                                 ->  Seq Scan on events  (cost=0.00..47.92 rows=150 width=664) (actual time=0.007..0.488 rows=195 loops=1)
                                       Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision) AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
 Total runtime: 556.459 ms

谢谢!

编辑:感谢您的链接。我尝试过禁用seqscan。现在我的计划是:

 Sort  (cost=5565.73..5566.10 rows=150 width=46) (actual time=451.208..451.290 rows=76 loops=1)
   Sort Key: (date(events.start_date))
   Sort Method: quicksort  Memory: 31kB
   ->  GroupAggregate  (cost=0.00..5560.31 rows=150 width=46) (actual time=2.990..450.850 rows=76 loops=1)
         Filter: (count(albums.id) > 0)
         ->  Nested Loop  (cost=0.00..5454.44 rows=20724 width=46) (actual time=0.077..278.319 rows=48050 loops=1)
               ->  Merge Join  (cost=0.00..205.35 rows=163 width=46) (actual time=0.051..2.856 rows=318 loops=1)
                     Merge Cond: (events.id = albums.event_id)
                     ->  Index Scan using events_pkey on events  (cost=0.00..118.72 rows=150 width=42) (actual time=0.024..0.792 rows=195 loops=1)
                           Filter: ((deleted_at IS NULL) AND (latitude >= (-44.1970887423161)::double precision) AND (latitude <= (-23.2200394118382)::double precision)
 AND (longitude >= 133.226480859375::double precision) AND (longitude <= 165.570230859375::double precision))
                     ->  Index Scan using index_albums_on_event_id on albums  (cost=0.00..83.83 rows=321 width=8) (actual time=0.017..0.832 rows=321 loops=1)
                           Filter: (deleted_at IS NULL)
               ->  Index Scan using index_photos_on_album_id on photos  (cost=0.00..30.27 rows=155 width=4) (actual time=0.010..0.409 rows=151 loops=318)
                     Index Cond: (album_id = albums.id)
                     Filter: (deleted_at IS NULL)
 Total runtime: 451.562 ms

静止索引并未完全用于纬度和长期条件。我是否正确设置了索引?

编辑:在http://stackoverflow.com/questions/8228326/how-can-i-avoid-postgresql-sometimes-choosing-a-bad-query-plan-for-one-of-two-ne查看答案后,我认为它的行为是这样的,因为我的查询返回了所有记录,然后我更新了条件,新的查询计划是:

 Sort  (cost=786.18..786.22 rows=19 width=668) (actual time=3.754..3.755 rows=2 loops=1)
   Sort Key: events.start_date
   Sort Method: quicksort  Memory: 25kB
   ->  HashAggregate  (cost=785.54..785.77 rows=19 width=668) (actual time=3.700..3.703 rows=2 loops=1)
         Filter: (count(albums.id) > 0)
         ->  Nested Loop  (cost=48.39..765.51 rows=2670 width=668) (actual time=1.116..2.968 rows=543 loops=1)
               ->  Hash Join  (cost=48.39..89.25 rows=21 width=668) (actual time=1.093..1.128 rows=3 loops=1)
                     Hash Cond: (events.id = albums.event_id)
                     ->  Bitmap Heap Scan on events  (cost=9.42..49.44 rows=19 width=664) (actual time=0.061..0.080 rows=9 loops=1)
                           Recheck Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
                           Filter: (deleted_at IS NULL)
                           ->  Bitmap Index Scan on index_events_on_latitude_and_longitude  (cost=0.00..9.42 rows=28 width=0) (actual time=0.050..0.050 rows=9 loops=1)
                                 Index Cond: ((latitude >= (-33.7474111086624)::double precision) AND (latitude <= (-33.581678187556)::double precision) AND (longitude >= 151.193933862305::double precision) AND (longitude <= 151.44661940918::double precision))
                     ->  Hash  (cost=34.95..34.95 rows=321 width=8) (actual time=0.992..0.992 rows=321 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 13kB
                           ->  Bitmap Heap Scan on albums  (cost=14.74..34.95 rows=321 width=8) (actual time=0.069..0.570 rows=321 loops=1)
                                 Recheck Cond: (deleted_at IS NULL)
                                 ->  Bitmap Index Scan on index_albums_on_deleted_at  (cost=0.00..14.66 rows=321 width=0) (actual time=0.056..0.056 rows=321 loops=1)
                                       Index Cond: (deleted_at IS NULL)
               ->  Index Scan using index_photos_on_album_id on photos  (cost=0.00..30.27 rows=155 width=4) (actual time=0.014..0.273 rows=181 loops=3)
                     Index Cond: (album_id = albums.id)
                     Filter: (deleted_at IS NULL)
 Total runtime: 3.958 ms

时间少了! 有什么建议吗?

1 个答案:

答案 0 :(得分:1)

这主要是因为您没有在查询中设置LIMIT子句。如果没有LIMIT,您总是会从表中请求所有数据,因此查看其索引也是不够的。

SQLFiddle 1, 2 vs. 3, 4

另请注意,FOREIGN KEY不会添加索引(但UNIQUE&amp; PRIMARY KEY约束确实存在)。因此,您可能希望为albums.event_id&amp;添加索引。 photos.album_id

SQLFiddle 3 vs. 4

排序可以使用索引(如果存在)。在您的查询中,这意味着events.start_date上的索引。