慢查询有很多连接

时间:2012-12-09 06:06:20

标签: sql postgresql join

为混合格式提前道歉,我把它分解了一下,看起来会更好。

我通常能够使用成本来确定查询的缓慢部分,但在这一部分中,它似乎真的很普遍而且没什么太明显的。出于某种原因,当我通过random()对它进行排序时,它会从几毫秒减慢到8秒:

EXPLAIN for:
  SELECT songs.*
  FROM "songs"
    INNER JOIN "posts" ON "posts"."id" = "songs"."post_id"
    INNER JOIN broadcasts on broadcasts.song_id = songs.id
    inner join stations as ss on ss.id = broadcasts.station_id
    inner join artists on artists.station_slug = ss.slug
    inner join artists_genres on artists_genres.artist_id = artists.id
    inner join genres on genres.id = artists_genres.genre_id
    inner join broadcasts bb on bb.song_id = songs.id
    inner join stations sss on sss.id = bb.station_id
    inner join blogs b on b.station_slug = sss.slug
    inner join blogs_genres on blogs_genres.blog_id = b.id
    inner join genres gg on gg.id = blogs_genres.genre_id
    INNER JOIN "stations" on "stations"."blog_id" = "songs"."blog_id"
    INNER JOIN blogs on blogs.id = posts.blog_id
  WHERE "songs"."working" = 't'
    AND "songs"."processed" = 't'
    AND (genres.id = 20 and gg.id = 20)
    AND (NOT EXISTS(
      SELECT NULL
      FROM authors
      WHERE authors.song_id = songs.id
      AND authors.role IN ('remixer', 'mashup')
    ))
    AND (songs.source != 'youtube')
    AND (songs.seconds < 600)
    AND (songs.matching_id = songs.id)
    ORDER BY random() desc
                                                                                                             QUERY PLAN
 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
  Sort  (cost=3770.97..3770.98 rows=1 width=2132)
    Sort Key: (random())
    ->  Nested Loop  (cost=3142.28..3770.96 rows=1 width=2132)
          Join Filter: ((b.station_slug)::text = (sss.slug)::text)
          ->  Nested Loop Anti Join  (cost=3142.28..3762.68 rows=1 width=2150)
                ->  Nested Loop  (cost=2526.24..3135.35 rows=1 width=2150)
                      Join Filter: (posts.blog_id = blogs.id)
                      ->  Nested Loop  (cost=2526.24..3128.65 rows=1 width=2127)
                            ->  Nested Loop  (cost=2526.24..3127.39 rows=1 width=2131)
                                  ->  Hash Join  (cost=2515.19..3112.87 rows=1 width=2113)
                                        Hash Cond: (stations.blog_id = songs.blog_id)
                                        ->  Seq Scan on stations  (cost=0.00..475.49 rows=19549 width=40)
                                        ->  Hash  (cost=2515.15..2515.15 rows=3 width=2077)
                                              ->  Nested Loop  (cost=1265.66..2515.15 rows=3 width=2077)
                                                    ->  Nested Loop  (cost=1265.66..2506.44 rows=1 width=2077)
                                                          ->  Nested Loop  (cost=1265.66..2499.97 rows=1 width=1809)
                                                                ->  Nested Loop  (cost=1265.66..2498.71 rows=1 width=1813)
                                                                      ->  Hash Join  (cost=1265.66..2357.90 rows=277 width=8)
                                                                            Hash Cond: (broadcasts.station_id = ss.id)
                                                                            ->  Seq Scan on broadcasts  (cost=0.00..895.98 rows=51598 width=8)
                                                                            ->  Hash  (cost=1264.35..1264.35 rows=105 width=8)
                                                                                  ->  Hash Join  (cost=714.50..1264.35 rows=105 width=8)
                                                                                        Hash Cond: ((ss.slug)::text = (artists.station_slug)::text)
                                                                                        ->  Seq Scan on stations ss  (cost=0.00..475.49 rows=19549 width=18)
                                                                                        ->  Hash  (cost=713.20..713.20 rows=104 width=18)
                                                                                              ->  Nested Loop  (cost=5.06..713.20 rows=104 width=18)
                                                                                                    ->  Bitmap Heap Scan on artists_genres  (cost=5.06..99.85 rows=104 width=8)
                                                                                                          Recheck Cond: (genre_id = 20)
                                                                                                          ->  Bitmap Index Scan on index_artists_genres_on_genre_id_and_artist_id  (cost=0.00..5.03 rows=104 width=0)
                                                                                                                Index Cond: (genre_id = 20)
                                                                                                    ->  Index Scan using artists_pkey on artists  (cost=0.00..5.89 rows=1 width=18)
                                                                                                          Index Cond: (artists.id = artists_genres.artist_id)
                                                                      ->  Index Scan using index_songs_on_shared_id on songs  (cost=0.00..0.50 rows=1 width=1805)
                                                                            Index Cond: (songs.matching_id = broadcasts.song_id)
                                                                            Filter: (songs.working AND songs.processed AND ((songs.source)::text <> 'youtube'::text) AND (songs.seconds < 600) AND (songs.id = songs.matching_id))
                                                                ->  Seq Scan on genres  (cost=0.00..1.25 rows=1 width=4)
                                                                      Filter: (genres.id = 20)
                                                          ->  Index Scan using posts_pkey on posts  (cost=0.00..6.46 rows=1 width=268)
                                                                Index Cond: (posts.id = songs.post_id)
                                                    ->  Index Scan using index_songs_stations_on_song_id_and_station_id on broadcasts bb  (cost=0.00..8.67 rows=3 width=8)
                                                          Index Cond: (bb.song_id = broadcasts.song_id)
                                  ->  Hash Join  (cost=11.05..14.39 rows=13 width=18)
                                        Hash Cond: (blogs_genres.blog_id = b.id)
                                        ->  Bitmap Heap Scan on blogs_genres  (cost=4.35..7.51 rows=13 width=8)
                                              Recheck Cond: (genre_id = 20)
                                              ->  Bitmap Index Scan on index_blogs_genres_on_genre_id_and_blog_id  (cost=0.00..4.35 rows=13 width=0)
                                                    Index Cond: (genre_id = 20)
                                        ->  Hash  (cost=5.20..5.20 rows=120 width=18)
                                              ->  Seq Scan on blogs b  (cost=0.00..5.20 rows=120 width=18)
                            ->  Seq Scan on genres gg  (cost=0.00..1.25 rows=1 width=4)
                                  Filter: (gg.id = 20)
                      ->  Seq Scan on blogs  (cost=0.00..5.20 rows=120 width=31)
                ->  Bitmap Heap Scan on authors  (cost=616.04..623.56 rows=2 width=4)
                      Recheck Cond: (authors.song_id = songs.id)
                      Filter: ((authors.role)::text = ANY ('{remixer,mashup}'::text[]))
                      ->  Bitmap Index Scan on index_authors_on_artist_id_and_song_id_and_role  (cost=0.00..616.04 rows=2 width=0)
                            Index Cond: (authors.song_id = songs.id)
          ->  Index Scan using stations_pkey on stations sss  (cost=0.00..8.27 rows=1 width=18)
                Index Cond: (sss.id = bb.station_id)

有什么想法导致经济放缓?在Mac上使用Postgres 9.1。

2 个答案:

答案 0 :(得分:1)

您希望确保为要加入的所有列构建索引。您还应该尝试避免嵌套查询(根据我的经验),因此您可能需要重写NOT EXISTS()部分。

答案 1 :(得分:0)

我理解的问题是order by random()会对查询速度产生巨大影响的原因。对于两个版本的查询,我不确定并且不知道没有EXPLAIN ANALYSE,但我怀疑这个问题与处理volatile函数有关。无法列出易失性函数,在这种情况下,PostgreSQL可能会返回并向每一行添加random()值,然后在已经扫描了表之后进行排序等。这可能是您的问题。

更好的方法可能是将random()添加到列列表中,然后按别名或列号排序。这样可以在进程更早的时候更早地计算random()。