TSQL:如何连接GROUPED值的字符串

时间:2014-09-17 08:47:07

标签: tsql

我遇到了很多关于此问题的帖子,所提出的解决方案都倾向于采用相同的方式,但在我的情况下它非常不方便。

大多数时候建议这样的事情。

DECLARE @Actors TABLE ( [Id] INT , [Name] VARCHAR(20) , [MovieId] INT);
DECLARE @Movie TABLE ( [Id] INT, [Name] VARCHAR(20), [FranchiseId] INT );


INSERT  INTO @Actors
    ( Id, Name, MovieId )
VALUES  ( 1, 'Sean Connery', 1 ),
    ( 2, 'Gert Fröbe', 1 ),
    ( 3, 'Honor Blackman', 1 ),
    ( 4, 'Daniel Craig', 2 ),
    ( 5, 'Judi Dench', 2 ),
    ( 2, 'Harrison Ford', 3 )

INSERT  INTO @Movie
    ( Id, Name, FranchiseId )
VALUES  ( 1, 'Goldfinger', 1 ),
    ( 2, 'Skyfall', 1 ),
    ( 3, 'Return of the Jedi', 2 )


SELECT  m.Name ,
    STUFF(( SELECT  ',' + a_c.Name
            FROM    @Actors a_c
            WHERE   a_c.MovieId = m.Id
          FOR
            XML PATH('')
          ), 1, 1, '')
FROM    @Actors a
    JOIN @Movie m ON a.MovieId = m.Id
GROUP BY m.Id ,
    m.Name

问题是(我该如何解释?),一个人并没有真正访问分组的项目(如Count(),Max(),Min(),...),一个人重建了“外部查询“和WHERE语句中的强制,相应的值与GROUP BY语句中的相同(在外部查询中)。

如果你不明白我想说的是什么,我通过一个额外的表扩展了上面的例子,你会看到,我还必须扩展“内部查询”

DECLARE @Actors TABLE ( [Id] INT , [Name] VARCHAR(20) , [MovieId] INT);
DECLARE @Movie TABLE ( [Id] INT, [Name] VARCHAR(20), [FranchiseId] INT );
DECLARE @Franchise TABLE ( [Id] INT , [Name] VARCHAR(20));


INSERT  INTO @Actors
    ( Id, Name, MovieId )
VALUES  ( 1, 'Sean Connery', 1 ),
    ( 2, 'Gert Fröbe', 1 ),
    ( 3, 'Honor Blackman', 1 ),
    ( 4, 'Daniel Craig', 2 ),
    ( 5, 'Judi Dench', 2 ),
    ( 2, 'Harrison Ford', 3 )

INSERT  INTO @Movie
    ( Id, Name, FranchiseId )
VALUES  ( 1, 'Goldfinger', 1 ),
    ( 2, 'Skyfall', 1 ),
    ( 3, 'Return of the Jedi', 2 )

INSERT  INTO @Franchise
    ( Id, Name )
VALUES  ( 1, 'James Bond' ),
    ( 2, 'Star Wars' )


SELECT  f.Name ,
    STUFF(( SELECT  ',' + a_c.Name
            FROM    @Actors a_c
                    JOIN @Movie m_c ON a_c.MovieId = m_c.Id
            WHERE   m_c.FranchiseId = f.Id
          FOR
            XML PATH('')
          ), 1, 1, '')
FROM    @Actors a
    JOIN @Movie m ON a.MovieId = m.Id
    JOIN @Franchise f ON m.FranchiseId = m.Id
GROUP BY f.Id ,
    f.Name

现在,进一步,想象一个巨大的查询,非常复杂,在许多表上的几个分组值。性能是一个问题。我不想在“内部查询”中重建整个连接模式。

还有其他方法吗?一种不会破坏性能的方法,您不必复制连接模式?

2 个答案:

答案 0 :(得分:0)

您可以使用公用表表达式(CTE)简化查询。这样,您只需要指定一次JOIN。此外,您确实只需要GROUP BY中的一列:

WITH idsAndNames AS -- the CTE that is used in two places
(
    SELECT f.Id AS FranchiseId,
           f.Name AS FranchiseName,
           m.Id AS MovieId,
           m.Name AS MovieName,
           a.Id AS ActorId,
           a.Name As ActorName
    FROM @Actors a
    JOIN @Movie m ON a.MovieId = m.Id
    JOIN @Franchise f ON m.FranchiseId = f.Id
)
SELECT n.FranchiseName,
       STUFF((SELECT ',' + x.ActorName -- you might need a DISTINCT here, btw.
              FROM idsAndNames x
              WHERE x.FranchiseId = n.FranchiseId
             FOR XML PATH('')), 1, 1, '')
FROM idsAndNames n
GROUP BY /* n.FranchiseId, */ n.FranchiseName -- name might suffice if it's unique

答案 1 :(得分:0)

与我所说的in this comment相反,你根本不需要GROUP BY子句 ,也不需要WHERE子句

你只需要外部SELECT来“迭代”所有特许经营权(或任何你想要分组的)。然后在内部SELECT中,您需要一些JOIN来到特许经营关键列。 而不是通过外部特许经营权的密钥过滤的WHERE子句,只需直接在INNER JOIN中使用外部特许经营权密钥:

SELECT f.Name AS FranchiseName,
       COALESCE(STUFF((SELECT DISTINCT ', ' + a.Name
                       FROM @Actor a
                       JOIN @Movie m ON a.MovieId = m.Id
                       WHERE m.FranchiseId = f.Id
                       ORDER BY ', ' + a.Name -- this is optional
                       FOR XML PATH('')), 1, 1, ''), '') AS ActorNames
FROM @Franchise f

信息来源: "High Performance T-SQL Using Window Functions" by Itzik Ben-Gak。因为SQL Server遗憾的是没有用于连接值的聚合/窗口函数,所以本书的作者推荐了类似上面的内容作为下一个最佳解决方案。

  

P.S。:我已删除my previous solution that substituted an additional JOIN for a WHERE clause;我现在相当确定a WHERE clause is likely to perform better。尽管如此,我还是留下了一些我之前的解决方案的证据(即罢工文本),因为我之前提到了一个评论。