我正在尝试使用子查询作为连接表来执行一些聚合工作:
SELECT r.source_uri AS su_on_r,
c_on_tComposer
FROM release r
LEFT JOIN (
SELECT track, string_agg(distinct composer, '|') as c_on_tComposer
FROM track_composer
GROUP BY track
) tComposer ON r.id = tComposer.track
我在子查询中执行此操作的原因是,如果我只是加入track_composer
表然后在SELECT
中执行聚合,那么当其他1:M表(此处未显示)时,我有重复数据)加入。如果我使用带有聚合的子查询连接表,我可以确保始终返回一行,从而减少数据重复。
麻烦的是,Postgresql中的查询规划器尝试在track_composer
表上执行seq扫描:
-> Materialize (cost=3567796.15..3806217.09 rows=2988177 width=48) (actual time=20629.349..76646.074 rows=12998764 loops=1)
-> GroupAggregate (cost=3567796.15..3768864.88 rows=2988177 width=48) (actual time=20629.342..70072.823 rows=12996153 loops=1)
Group Key: track_composer.track
-> Sort (cost=3567796.15..3622368.32 rows=21828868 width=30) (actual time=20629.309..36473.835 rows=21778170 loops=1)
Sort Key: track_composer.track
Sort Method: external merge Disk: 864192kB
-> Seq Scan on track_composer (cost=0.00..384612.68 rows=21828868 width=30) (actual time=0.041..5085.321 rows=21828
我有track_composer.track
的索引。
由于GROUP BY
看起来这样做了,但由于string_agg
聚合,我需要这样做。我是不正确的,或者我错过了什么?
答案 0 :(得分:2)
如果您不想进行顺序扫描,请尝试在(track, composer)
上定义索引:
create index idx_track_composer_track_composer on track_composer(track, composer);
这被称为复合索引 - 一种说它有多个密钥的奇特方式。
在这种情况下,我在使用相关子查询的其他数据库中运气良好:
SELECT r.source_uri AS su_on_r,
(SELECT string_agg(distinct composer, '|') as c_on_tComposer
FROM track_composer tc
WHERE r.id = tc.track
) as c_on_tComposer
FROM release r;