我写了一个连接十四个表的查询。当条件返回大量行时,查询需要很长时间。这是原始查询,具有较大的IN
条件:
SELECT r.source_uri AS su_on_r, r.title AS t_on_r, r.subtitle AS s_on_r, r.artist_name AS an_on_r, r.asin AS a_on_r, r.country AS c_on_r, r.release_date AS rd_on_r, string_agg(DISTINCT barcode.barcode::TEXT, '|') AS b_on_barcode, string_agg(DISTINCT genre.genre::TEXT, '|') AS g_on_genre, string_agg(DISTINCT typ.type::TEXT, '|') AS t_on_typ, string_agg(tag.voted_tag::TEXT, '|') AS vt_on_tag, IMAGE.uri AS u_on_image, IMAGE.width AS w_on_image, IMAGE.height AS h_on_image, IMAGE.score AS s_on_image, string_agg(DISTINCT imageType.image_type::TEXT, '|') AS it_on_imageType, string_agg(tag.votes::TEXT, '|') AS v_on_tag, string_agg(DISTINCT url.url::TEXT, '|') AS u_on_url, event.label_name AS ln_on_event, event.cat AS c_on_event, m.position AS p_on_m, m.title AS t_on_m, m.format AS f_on_m, t.position AS p_on_t, t.title AS t_on_t, string_agg(DISTINCT t.duration::TEXT, '|') AS d_on_t, string_agg(DISTINCT tArtist.artist::TEXT, '|') AS a_on_tArtist, string_agg(DISTINCT tComposer.composer::TEXT, '|') AS c_on_tComposer, string_agg(DISTINCT tIsrc.isrc::TEXT, '|') AS i_on_tIsrc
FROM release r
LEFT JOIN release_barcode barcode ON r.source_uri = barcode.source_uri
LEFT JOIN release_genre genre ON r.source_uri = genre.source_uri
LEFT JOIN release_type typ ON r.source_uri = typ.source_uri
LEFT JOIN release_voted_tag tag ON r.source_uri = tag.source_uri
LEFT JOIN release_image IMAGE ON r.source_uri = IMAGE.source_uri
LEFT JOIN release_image_type imageType ON IMAGE.id = imageType.image_id
LEFT JOIN release_url url ON r.source_uri = url.source_uri
LEFT JOIN release_event event ON r.source_uri = event.source_uri
LEFT JOIN medium m ON r.source_uri = m.source_uri
LEFT JOIN track t ON m.id = t.medium
LEFT JOIN track_artist tArtist ON t.id = tArtist.track
LEFT JOIN track_composer tComposer ON t.id = tComposer.track
LEFT JOIN track_isrc tIsrc ON t.id = tIsrc.track
WHERE r.source_uri IN (
'https://api.discogs.com/releases/1955915'
,'https://api.discogs.com/releases/8602631'
,[and so on for about thirty more URIs]
)
GROUP BY su_on_r, t_on_r, s_on_r, an_on_r, a_on_r, c_on_r, rd_on_r, u_on_image, w_on_image, h_on_image, s_on_image, ln_on_event, c_on_event, p_on_m, t_on_m, f_on_m, p_on_t, t_on_t;
看一下解释,由于大的GROUP BY语句,大部分工作都在排序:https://explain.depesz.com/s/dV5o
您可以看到聚合在> 90k行上工作。由于连接数的原因,行数非常大,许多1:m的表会导致行的指数增长。
所以我想知道如何重写查询而不必将所有这些行组合起来。我决定将连接编写为子查询,并将聚合移动到这些子查询中。
我的第一次尝试是(仅release_barcode
的一个示例,对所有表重复):
LEFT JOIN (
SELECT source_uri, string_agg(DISTINCT barcode::TEXT, '|') AS b_on_barcode
FROM release_barcode
GROUP BY source_uri
) AS barcode ON r.source_uri = barcode.source_uri
这样做的原因是返回的行数更少,而且我不需要进行大量的排序,因为顶级查询中没有GROUP BY。
但是,这个速度慢了! 这是因为查询规划器似乎没有首先应用顶级查询的条件。而是将它们连接在一起。
所以我尝试了不同的东西;为了强制每个子查询中的过滤器,我只是复制了标准:
LEFT JOIN (
SELECT source_uri, string_agg(DISTINCT barcode::TEXT, '|') AS b_on_barcode
FROM release_barcode
WHERE source_uri IN (
'https://api.discogs.com/releases/1955915'
,'https://api.discogs.com/releases/8602631'
,[and so on for about thirty more URIs]
)
GROUP BY source_uri
) AS barcode ON r.source_uri = barcode.source_uri
在每个子查询中只复制了WHERE
子句。
结果不言而喻:https://explain.depesz.com/s/exSw
一个更复杂的查询,但速度提高了100倍!
但当然,重复的标准闻起来非常有吸引力。
所以我的问题有两个:
答案 0 :(得分:0)
增加geqo_treshold
(甚至join_collapse_limit
)注意:这可能会将计划时间增加到一秒以上
通过将紧密相关表格拆分为CTE来减少范围表条目的数量:
%uri
字段:将其放入单独的表中并通过代理键引用它)WITH rel AS (
SELECT * FROM release
WHERE source_uri IN (
'https://api.discogs.com/releases/1955915'
,'https://api.discogs.com/releases/8602631'
-- ,[and so on for about thirty more URIs]
)
, media AS (
SELECT *
FROM medium m -- ON r.source_uri = m.source_uri
LEFT JOIN track t ON m.id = t.medium
LEFT JOIN track_artist tArtist ON t.id = tArtist.track
LEFT JOIN track_composer tComposer ON t.id = tComposer.track
LEFT JOIN track_isrc tIsrc ON t.id = tIsrc.track
)
SELECT r.source_uri AS su_on_r, r.title AS t_on_r, r.subtitle AS s_on_r, r.artist_name AS an_on_r
, r.asin AS a_on_r, r.country AS c_on_r, r.release_date AS rd_on_r
, string_agg(DISTINCT barcode.barcode::TEXT, '|') AS b_on_barcode
, string_agg(DISTINCT genre.genre::TEXT, '|') AS g_on_genre
, string_agg(DISTINCT typ.type::TEXT, '|') AS t_on_typ
, string_agg(tag.voted_tag::TEXT, '|') AS vt_on_tag
, img.uri AS u_on_image, img.width AS w_on_image
, img.height AS h_on_image, img.score AS s_on_image
, string_agg(DISTINCT imageType.image_type::TEXT, '|') AS it_on_imageType
, string_agg(tag.votes::TEXT, '|') AS v_on_tag
, string_agg(DISTINCT url.url::TEXT, '|') AS u_on_url
, event.label_name AS ln_on_event, event.cat AS c_on_event
, m.position AS p_on_m, m.title AS t_on_m, m.format AS f_on_m
, m.position AS p_on_t, m.title AS t_on_t <<-- !!need to fix thes in the CTE
, string_agg(DISTINCT m.duration::TEXT, '|') AS d_on_t
, string_agg(DISTINCT m.artist::TEXT, '|') AS a_on_tArtist
, string_agg(DISTINCT m.composer::TEXT, '|') AS c_on_tComposer
, string_agg(DISTINCT m.isrc::TEXT, '|') AS i_on_tIsrc
FROM rel r -- <<--- ########################## CTE
LEFT JOIN release_barcode barcode ON r.source_uri = barcode.source_uri
LEFT JOIN release_genre genre ON r.source_uri = genre.source_uri
LEFT JOIN release_type typ ON r.source_uri = typ.source_uri
LEFT JOIN release_voted_tag tag ON r.source_uri = tag.source_uri
LEFT JOIN release_image img ON r.source_uri = img.source_uri
LEFT JOIN release_image_type imageType ON img.id = imageType.image_id
LEFT JOIN release_url url ON r.source_uri = url.source_uri
LEFT JOIN release_event event ON r.source_uri = event.source_uri
LEFT JOIN media ON r.source_uri = media.source_uri -- <<--- ########################## CTE
GROUP BY su_on_r, t_on_r, s_on_r, an_on_r
, a_on_r, c_on_r, rd_on_r
, u_on_image, w_on_image, h_on_image
, s_on_image, ln_on_event, c_on_event
, p_on_m, t_on_m, f_on_m, p_on_t, t_on_t
;
注意:在将术语移至media
CTE时,我犯了一些错误。还有一些重命名要做......