JOIN内的子查询导致seq扫描

时间:2017-11-08 17:14:29

标签: sql postgresql

我正在尝试使用子查询作为连接表来执行一些聚合工作:

SELECT r.source_uri       AS su_on_r,
       c_on_tComposer
FROM   release r
       LEFT JOIN (
             SELECT track, string_agg(distinct composer, '|') as c_on_tComposer
             FROM track_composer
             GROUP BY track
       ) tComposer ON r.id = tComposer.track

我在子查询中执行此操作的原因是,如果我只是加入track_composer表然后在SELECT中执行聚合,那么当其他1:M表(此处未显示)时,我有重复数据)加入。如果我使用带有聚合的子查询连接表,我可以确保始终返回一行,从而减少数据重复。

麻烦的是,Postgresql中的查询规划器尝试在track_composer表上执行seq扫描:

->  Materialize  (cost=3567796.15..3806217.09 rows=2988177 width=48) (actual time=20629.349..76646.074 rows=12998764 loops=1)          
      ->  GroupAggregate  (cost=3567796.15..3768864.88 rows=2988177 width=48) (actual time=20629.342..70072.823 rows=12996153 loops=1) 
            Group Key: track_composer.track                                                                                            
            ->  Sort  (cost=3567796.15..3622368.32 rows=21828868 width=30) (actual time=20629.309..36473.835 rows=21778170 loops=1)    
                  Sort Key: track_composer.track                                                                                       
                  Sort Method: external merge  Disk: 864192kB                                                                          
                  ->  Seq Scan on track_composer  (cost=0.00..384612.68 rows=21828868 width=30) (actual time=0.041..5085.321 rows=21828

我有track_composer.track的索引。

由于GROUP BY看起来这样做了,但由于string_agg聚合,我需要这样做。我是不正确的,或者我错过了什么?

1 个答案:

答案 0 :(得分:2)

如果您不想进行顺序扫描,请尝试在(track, composer)上定义索引:

create index idx_track_composer_track_composer on track_composer(track, composer);

这被称为复合索引 - 一种说它有多个密钥的奇特方式。

在这种情况下,我在使用相关子查询的其他数据库中运气良好:

SELECT r.source_uri AS su_on_r,
       (SELECT string_agg(distinct composer, '|') as c_on_tComposer
        FROM track_composer tc
        WHERE r.id = tc.track
       ) as c_on_tComposer
FROM release r;