在SQL Server中的GROUP BY之后获取组项

时间:2017-01-20 10:01:43

标签: sql sql-server tsql

我有一个包含这些列的表:

UserID1, UserID2, ProductID, PurchaseDate

以下查询在购买表中运行,并返回多个用户之间的交互次数,无论过去31天内的订单如何:

DECLARE @threshold AS INT
DECLARE @days AS INT

SET @threshold = 10
SET @days = 31

SELECT 
    UserID1, UserID2, COUNT(*) AS Counter
FROM 
    (SELECT
        --do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
        CASE 
           WHEN UserID1 < UserID2 
              THEN UserID1 
              ELSE UserID2 
        END AS UserID1,
        CASE 
           WHEN UserID1 < UserID2 
              THEN UserID2 
              ELSE UserID1 
        END AS UserID2
    FROM
        Purchases WITH(NOLOCK)
    WHERE 
        Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()) t
GROUP BY 
    UserID1, UserID2
HAVING 
    COUNT(*) > @threshold

收益率:

UserID1  UserID2  Counter
1        2        10
3        2        5
4        1        8

但是,我想要的是返回一个包含ProductIDPurchaseDate的表格,如下所示

UserID1  UserID2  ProductID  PurchaseDate
1        2        12345      2017-01-18 00:13:52
1        2        5425       2017-01-12 15:10:02
1        2        64362      2017-01-05 10:10:02
..... for the 10 interactions
3        2        25235      2017-01-18 00:13:52
3        2        436346     2017-01-14 00:13:52
..... for the 5 interactions
4        1        23523      2017-01-14 00:13:52
4        1        135135     2017-01-09 00:13:52
..... for the 8 interactions

有没有办法不将第一个查询的结果放在临时表中,然后再次与Purchases表一起加入以查找所有购买内容?

2 个答案:

答案 0 :(得分:2)

如果我理解正确,那么简单的窗口COUNT将有助于此。

优化器应该足够聪明,可以在一次扫描表中完成。

DECLARE @threshold AS INT;
DECLARE @days AS INT;

SET @threshold = 10;
SET @days = 31;

WITH
CTE_Purchases
AS
(
    SELECT
        --do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
        CASE 
            WHEN UserID1 < UserID2 
            THEN UserID1 
            ELSE UserID2 
        END AS UserID1
        ,CASE 
            WHEN UserID1 < UserID2 
            THEN UserID2 
            ELSE UserID1 
        END AS UserID2
        ,ProductID
        ,PurchaseDate
    FROM
        Purchases
    WHERE 
        Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()
)
,CTE_Counts
AS
(
    SELECT
        UserID1
        ,UserID2
        ,ProductID
        ,PurchaseDate
        ,COUNT(*) OVER (PARTITION BY UserID1, UserID2) AS Counter
        -- calc COUNT for groups without explicit GROUP BY
    FROM CTE_Purchases
)
SELECT
    UserID1
    ,UserID2
    ,ProductID
    ,PurchaseDate
    ,Counter
FROM CTE_Counts
WHERE Counter > @threshold
-- this filter is instead of your HAVING
;

答案 1 :(得分:0)

免责声明:我没有测试过代码,它是在T-SQL IDE之外编写的。 以下代码基于以下假设: UserID1!= UserID2

1)我建议使用MAX / MIN值解决方案来处理[Col1,Col2],方法与[Col2,Col1]相同。它可能会更好地执行并正确处理NULL。您需要SQL Server 2008(或更高版本)才能工作。

SELECT
    (SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID1,
    (SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID2
FROM
    Purchases

2)现在我们需要实际计算它们之间的相互作用,这应该很容易。为了保持代码清洁,我们可以在前面的语句中使用CTE,我在那里添加截止日期过滤器:

;WITH CTE_UserInteractions AS (
    SELECT
        (SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
        (SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
    FROM
        Purchases
    WHERE
        Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)

SELECT
    FirstUser,
    SecondUser
FROM
    CTE_UserInteractions
GROUP BY
    FirstUser, SecondUser
HAVING
    COUNT(*) > @Threshold

快速注意:人们可能会发现提前计算左边界限边界会对性能产生积极影响。例如,在运行批处理之前,我们可以这样做:

DECLARE @StartDate DATETIME = DATEADD(DAY,-@days,GETDATE())

然后我们可以在WHERE子句中使用@StartDate。

3)最后,我们可以使用CROSS APPLY来获取结果留下的用户“对”的产品和购买列表。如果性能受到影响,我们可以使用子选择(我的解决方案)或预先填充临时表和步骤#2的结果。

;WITH CTE_UserInteractions AS (
    SELECT
        (SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
        (SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
    FROM
        Purchases AS p1
    WHERE
        Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)

SELECT
    groupedUsers.FirstUser as UserID1,
    groupedUsers.SecondUser as UserID2,
    products.ProductID,
    products.PurchaseDate
FROM (
    SELECT
        FirstUser,
        SecondUser
    FROM
        CTE_UserInteractions
    GROUP BY
        FirstUser, SecondUser
    HAVING
        COUNT(*) > @Threshold
) groupedUsers
CROSS APPLY (
    SELECT
        ProductID, PurchaseDate
    FROM
        Purchases AS p1
    WHERE
        p1.UserID1 = FirstUser AND p1.UserID2 = SecondUser
    UNION ALL
    SELECT
        ProductID, PurchaseDate
    FROM
        Purchases AS p2
    WHERE
        p2.UserID2 = FirstUser AND p2.UserID1 = SecondUser
) products