如何分组连接多个列?

时间:2016-06-07 16:24:08

标签: sql sql-server xml tsql xquery

假设此表:

PruchaseID | Customer | Product  | Method
-----------|----------|----------|--------
 1         | John     | Computer | Credit
 2         | John     | Mouse    | Cash
 3         | Will     | Computer | Credit
 4         | Will     | Mouse    | Cash
 5         | Will     | Speaker  | Cash
 6         | Todd     | Computer | Credit

我希望为每个客户生成他们购买的产品及其付款方式的报告 但我希望每个客户的报告都是一行,例如:

Customer | Products                 | Methods
---------|--------------------------|--------------
 John    | Computer, Mouse          | Credit, Cash
 Will    | Computer, Mouse, Speaker | Credit, Cash
 Todd    | Computer                 | Credit

到目前为止我发现的是使用XML PATH方法进行分组连接,例如:

SELECT
    p.Customer,
    STUFF(
        SELECT ', ' + xp.Product
        FROM Purchases xp
        WHERE xp.Customer = p.Customer
        FOR XML PATH('')), 1, 1, '') AS Products,
    STUFF(
        SELECT ', ' + xp.Method
        FROM Purchases xp
        WHERE xp.Customer = p.Customer
        FOR XML PATH('')), 1, 1, '') AS Methods
FROM Purchases

这给了我结果,但我关心的是这个速度 乍一看,这里有三个不同的选择,两个将乘以购买的行数。最终这会在一般情况下放慢速度。

那么,有没有办法以更好的表现来做到这一点? 我想添加更多的列来聚合,我应该为每一列做这个STUFF()块吗?这对我来说听起来不够快。

Siggestions?

3 个答案:

答案 0 :(得分:4)

只是一个想法:

DECLARE @t TABLE (
    Customer VARCHAR(50),
    Product VARCHAR(50),
    Method VARCHAR(50),
    INDEX ix CLUSTERED (Customer)
)

INSERT INTO @t (Customer, Product, Method)
VALUES
    ('John', 'Computer', 'Credit'),
    ('John', 'Mouse', 'Cash'),
    ('Will', 'Computer', 'Credit'),
    ('Will', 'Mouse', 'Cash'),
    ('Will', 'Speaker', 'Cash'),
    ('Todd', 'Computer', 'Credit')

SELECT t.Customer
     , STUFF(CAST(x.query('a/text()') AS NVARCHAR(MAX)), 1, 2, '')
     , STUFF(CAST(x.query('b/text()') AS NVARCHAR(MAX)), 1, 2, '')
FROM (
    SELECT DISTINCT Customer
    FROM @t
) t
OUTER APPLY (
    SELECT DISTINCT [a] = CASE WHEN id = 'a' THEN ', ' + val END
                  , [b] = CASE WHEN id = 'b' THEN ', ' + val END
    FROM @t t2
    CROSS APPLY (
        VALUES ('a', t2.Product)
             , ('b', t2.Method)
    ) t3 (id, val)
    WHERE t2.Customer = t.Customer
    FOR XML PATH(''), TYPE
) t2 (x)

输出:

Customer   Product                    Method     
---------- -------------------------- ------------------
John       Computer, Mouse            Cash, Credit
Todd       Computer                   Credit
Will       Computer, Mouse, Speaker   Cash, Credit

另一个具有更多性能优势的想法:

IF OBJECT_ID('tempdb.dbo.#EntityValues') IS NOT NULL
    DROP TABLE #EntityValues

DECLARE @Values1 VARCHAR(MAX)
      , @Values2 VARCHAR(MAX)

SELECT Customer
     , Product
     , Method
     , RowNum = ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY 1/0)
     , Values1 = CAST(NULL AS VARCHAR(MAX))
     , Values2 = CAST(NULL AS VARCHAR(MAX))
INTO #EntityValues
FROM @t

UPDATE #EntityValues
SET 
      @Values1 = Values1 =
        CASE WHEN RowNum = 1 
            THEN Product
            ELSE @Values1 + ', ' + Product 
        END
    , @Values2 = Values2 = 
        CASE WHEN RowNum = 1 
            THEN Method
            ELSE @Values2 + ', ' + Method
        END

SELECT Customer
      , Values1 = MAX(Values1) 
      , Values2 = MAX(Values2)
FROM #EntityValues
GROUP BY Customer

但有一些限制:

Customer      Values1                       Values2
------------- ----------------------------- ----------------------
John          Computer, Mouse               Credit, Cash
Todd          Computer                      Credit
Will          Computer, Mouse, Speaker      Credit, Cash, Cash

同时查看关于字符串聚合的旧帖子:

http://www.codeproject.com/Articles/691102/String-Aggregation-in-the-World-of-SQL-Server

答案 1 :(得分:1)

这是递归CTE(公用表表达式)的用例之一。您可以在https://technet.microsoft.com/en-us/library/ms190766(v=sql.105).aspx

了解更多信息
;
WITH CTE1 (PurchaseID, Customer, Product, Method, RowID)
AS
(
    SELECT 
        PurchaseID, Customer, Product, Method, 
        ROW_NUMBER() OVER (PARTITION BY Customer ORDER BY Customer)
    FROM
        @tbl 
        /* This table holds source data. I ommited declaring and inserting 
        data into it because that's not important. */
)
, CTE2 (PurchaseID, Customer, Product, Method, RowID)
AS
(
    SELECT 
        PurchaseID, Customer, 
        CONVERT(VARCHAR(MAX), Product), 
        CONVERT(VARCHAR(MAX), Method), 
        1
    FROM 
        CTE1 
    WHERE 
        RowID = 1
    UNION ALL
    SELECT 
        CTE2.PurchaseID, CTE2.Customer, 
        CONVERT(VARCHAR(MAX), CTE2.Product + ',' + CTE1.Product), 
        CONVERT(VARCHAR(MAX), CTE2.Method + ',' + CTE1.Method), 
        CTE2.RowID + 1 
    FROM 
        CTE2 INNER JOIN CTE1 
            ON CTE2.Customer = CTE1.Customer
            AND CTE2.RowID + 1 = CTE1.RowID
)

SELECT Customer, MAX(Product) AS Products, MAX(Method) AS Methods 
FROM CTE2 
GROUP BY Customer

输出:

Customer    Products                Methods
John        Computer,Mouse          Credit,Cash
Todd        Computer                Credit
Will        Computer,Mouse,Speaker  Credit,Cash,Cash

答案 2 :(得分:1)

另一个解决方案是用于组连接的CLR方法@aaron bertrand已对此here进行了性能比较。 如果您可以部署CLR,则从http://groupconcat.codeplex.com/下载免费的脚本。 所有细节都在文档中。 您的查询将改为像这样

SELECT Customer,dbo.GROUP_CONCAT(product),dbo.GROUP_CONCAT(method)
FROM Purchases
GROUP BY Customer

这个查询很简单,易于记忆和使用,XML方法也可以完成这项工作,但是记住代码有点困难(至少对我来说)并且像XML一样可以解决问题,这可以解决问题,也可以解决一些陷阱在他的博客中描述。

同样从性能角度来看,使用.query很耗时我遇到了同样的性能问题。我希望你能在https://dba.stackexchange.com/questions/125771/multiple-column-concatenation中找到我提出的这个问题。 检查kenneth fisher给出的第2版嵌套xml连接方法或spaggettidba建议的unpivot / pivot方法。