为混合格式提前道歉,我把它分解了一下,看起来会更好。
我通常能够使用成本来确定查询的缓慢部分,但在这一部分中,它似乎真的很普遍而且没什么太明显的。出于某种原因,当我通过random()对它进行排序时,它会从几毫秒减慢到8秒:
EXPLAIN for:
SELECT songs.*
FROM "songs"
INNER JOIN "posts" ON "posts"."id" = "songs"."post_id"
INNER JOIN broadcasts on broadcasts.song_id = songs.id
inner join stations as ss on ss.id = broadcasts.station_id
inner join artists on artists.station_slug = ss.slug
inner join artists_genres on artists_genres.artist_id = artists.id
inner join genres on genres.id = artists_genres.genre_id
inner join broadcasts bb on bb.song_id = songs.id
inner join stations sss on sss.id = bb.station_id
inner join blogs b on b.station_slug = sss.slug
inner join blogs_genres on blogs_genres.blog_id = b.id
inner join genres gg on gg.id = blogs_genres.genre_id
INNER JOIN "stations" on "stations"."blog_id" = "songs"."blog_id"
INNER JOIN blogs on blogs.id = posts.blog_id
WHERE "songs"."working" = 't'
AND "songs"."processed" = 't'
AND (genres.id = 20 and gg.id = 20)
AND (NOT EXISTS(
SELECT NULL
FROM authors
WHERE authors.song_id = songs.id
AND authors.role IN ('remixer', 'mashup')
))
AND (songs.source != 'youtube')
AND (songs.seconds < 600)
AND (songs.matching_id = songs.id)
ORDER BY random() desc
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3770.97..3770.98 rows=1 width=2132)
Sort Key: (random())
-> Nested Loop (cost=3142.28..3770.96 rows=1 width=2132)
Join Filter: ((b.station_slug)::text = (sss.slug)::text)
-> Nested Loop Anti Join (cost=3142.28..3762.68 rows=1 width=2150)
-> Nested Loop (cost=2526.24..3135.35 rows=1 width=2150)
Join Filter: (posts.blog_id = blogs.id)
-> Nested Loop (cost=2526.24..3128.65 rows=1 width=2127)
-> Nested Loop (cost=2526.24..3127.39 rows=1 width=2131)
-> Hash Join (cost=2515.19..3112.87 rows=1 width=2113)
Hash Cond: (stations.blog_id = songs.blog_id)
-> Seq Scan on stations (cost=0.00..475.49 rows=19549 width=40)
-> Hash (cost=2515.15..2515.15 rows=3 width=2077)
-> Nested Loop (cost=1265.66..2515.15 rows=3 width=2077)
-> Nested Loop (cost=1265.66..2506.44 rows=1 width=2077)
-> Nested Loop (cost=1265.66..2499.97 rows=1 width=1809)
-> Nested Loop (cost=1265.66..2498.71 rows=1 width=1813)
-> Hash Join (cost=1265.66..2357.90 rows=277 width=8)
Hash Cond: (broadcasts.station_id = ss.id)
-> Seq Scan on broadcasts (cost=0.00..895.98 rows=51598 width=8)
-> Hash (cost=1264.35..1264.35 rows=105 width=8)
-> Hash Join (cost=714.50..1264.35 rows=105 width=8)
Hash Cond: ((ss.slug)::text = (artists.station_slug)::text)
-> Seq Scan on stations ss (cost=0.00..475.49 rows=19549 width=18)
-> Hash (cost=713.20..713.20 rows=104 width=18)
-> Nested Loop (cost=5.06..713.20 rows=104 width=18)
-> Bitmap Heap Scan on artists_genres (cost=5.06..99.85 rows=104 width=8)
Recheck Cond: (genre_id = 20)
-> Bitmap Index Scan on index_artists_genres_on_genre_id_and_artist_id (cost=0.00..5.03 rows=104 width=0)
Index Cond: (genre_id = 20)
-> Index Scan using artists_pkey on artists (cost=0.00..5.89 rows=1 width=18)
Index Cond: (artists.id = artists_genres.artist_id)
-> Index Scan using index_songs_on_shared_id on songs (cost=0.00..0.50 rows=1 width=1805)
Index Cond: (songs.matching_id = broadcasts.song_id)
Filter: (songs.working AND songs.processed AND ((songs.source)::text <> 'youtube'::text) AND (songs.seconds < 600) AND (songs.id = songs.matching_id))
-> Seq Scan on genres (cost=0.00..1.25 rows=1 width=4)
Filter: (genres.id = 20)
-> Index Scan using posts_pkey on posts (cost=0.00..6.46 rows=1 width=268)
Index Cond: (posts.id = songs.post_id)
-> Index Scan using index_songs_stations_on_song_id_and_station_id on broadcasts bb (cost=0.00..8.67 rows=3 width=8)
Index Cond: (bb.song_id = broadcasts.song_id)
-> Hash Join (cost=11.05..14.39 rows=13 width=18)
Hash Cond: (blogs_genres.blog_id = b.id)
-> Bitmap Heap Scan on blogs_genres (cost=4.35..7.51 rows=13 width=8)
Recheck Cond: (genre_id = 20)
-> Bitmap Index Scan on index_blogs_genres_on_genre_id_and_blog_id (cost=0.00..4.35 rows=13 width=0)
Index Cond: (genre_id = 20)
-> Hash (cost=5.20..5.20 rows=120 width=18)
-> Seq Scan on blogs b (cost=0.00..5.20 rows=120 width=18)
-> Seq Scan on genres gg (cost=0.00..1.25 rows=1 width=4)
Filter: (gg.id = 20)
-> Seq Scan on blogs (cost=0.00..5.20 rows=120 width=31)
-> Bitmap Heap Scan on authors (cost=616.04..623.56 rows=2 width=4)
Recheck Cond: (authors.song_id = songs.id)
Filter: ((authors.role)::text = ANY ('{remixer,mashup}'::text[]))
-> Bitmap Index Scan on index_authors_on_artist_id_and_song_id_and_role (cost=0.00..616.04 rows=2 width=0)
Index Cond: (authors.song_id = songs.id)
-> Index Scan using stations_pkey on stations sss (cost=0.00..8.27 rows=1 width=18)
Index Cond: (sss.id = bb.station_id)
有什么想法导致经济放缓?在Mac上使用Postgres 9.1。
答案 0 :(得分:1)
您希望确保为要加入的所有列构建索引。您还应该尝试避免嵌套查询(根据我的经验),因此您可能需要重写NOT EXISTS()部分。
答案 1 :(得分:0)
我理解的问题是order by random()
会对查询速度产生巨大影响的原因。对于两个版本的查询,我不确定并且不知道没有EXPLAIN ANALYSE
,但我怀疑这个问题与处理volatile函数有关。无法列出易失性函数,在这种情况下,PostgreSQL可能会返回并向每一行添加random()
值,然后在已经扫描了表之后进行排序等。这可能是您的问题。
更好的方法可能是将random()添加到列列表中,然后按别名或列号排序。这样可以在进程更早的时候更早地计算random()。