我有一个包含这些列的表:
UserID1, UserID2, ProductID, PurchaseDate
以下查询在购买表中运行,并返回多个用户之间的交互次数,无论过去31天内的订单如何:
DECLARE @threshold AS INT
DECLARE @days AS INT
SET @threshold = 10
SET @days = 31
SELECT
UserID1, UserID2, COUNT(*) AS Counter
FROM
(SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1,
CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
FROM
Purchases WITH(NOLOCK)
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()) t
GROUP BY
UserID1, UserID2
HAVING
COUNT(*) > @threshold
收益率:
UserID1 UserID2 Counter
1 2 10
3 2 5
4 1 8
但是,我想要的是返回一个包含ProductID
和PurchaseDate
的表格,如下所示
UserID1 UserID2 ProductID PurchaseDate
1 2 12345 2017-01-18 00:13:52
1 2 5425 2017-01-12 15:10:02
1 2 64362 2017-01-05 10:10:02
..... for the 10 interactions
3 2 25235 2017-01-18 00:13:52
3 2 436346 2017-01-14 00:13:52
..... for the 5 interactions
4 1 23523 2017-01-14 00:13:52
4 1 135135 2017-01-09 00:13:52
..... for the 8 interactions
有没有办法不将第一个查询的结果放在临时表中,然后再次与Purchases
表一起加入以查找所有购买内容?
答案 0 :(得分:2)
如果我理解正确,那么简单的窗口COUNT
将有助于此。
优化器应该足够聪明,可以在一次扫描表中完成。
DECLARE @threshold AS INT;
DECLARE @days AS INT;
SET @threshold = 10;
SET @days = 31;
WITH
CTE_Purchases
AS
(
SELECT
--do this to revert columns and count as one case both Col1,Col2 and Col2,Col1
CASE
WHEN UserID1 < UserID2
THEN UserID1
ELSE UserID2
END AS UserID1
,CASE
WHEN UserID1 < UserID2
THEN UserID2
ELSE UserID1
END AS UserID2
,ProductID
,PurchaseDate
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day, -@days, GETDATE()) AND GETDATE()
)
,CTE_Counts
AS
(
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,COUNT(*) OVER (PARTITION BY UserID1, UserID2) AS Counter
-- calc COUNT for groups without explicit GROUP BY
FROM CTE_Purchases
)
SELECT
UserID1
,UserID2
,ProductID
,PurchaseDate
,Counter
FROM CTE_Counts
WHERE Counter > @threshold
-- this filter is instead of your HAVING
;
答案 1 :(得分:0)
免责声明:我没有测试过代码,它是在T-SQL IDE之外编写的。 以下代码基于以下假设: UserID1!= UserID2 。
1)我建议使用MAX / MIN值解决方案来处理[Col1,Col2],方法与[Col2,Col1]相同。它可能会更好地执行并正确处理NULL。您需要SQL Server 2008(或更高版本)才能工作。
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID1,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as UserID2
FROM
Purchases
2)现在我们需要实际计算它们之间的相互作用,这应该很容易。为了保持代码清洁,我们可以在前面的语句中使用CTE,我在那里添加截止日期过滤器:
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
快速注意:人们可能会发现提前计算左边界限边界会对性能产生积极影响。例如,在运行批处理之前,我们可以这样做:
DECLARE @StartDate DATETIME = DATEADD(DAY,-@days,GETDATE())
然后我们可以在WHERE子句中使用@StartDate。
3)最后,我们可以使用CROSS APPLY来获取结果留下的用户“对”的产品和购买列表。如果性能受到影响,我们可以使用子选择(我的解决方案)或预先填充临时表和步骤#2的结果。
;WITH CTE_UserInteractions AS (
SELECT
(SELECT MAX(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as FirstUser,
(SELECT MIN(usr) FROM (VALUES (UserID1), (UserID2) as User(usr)) as SecondUser
FROM
Purchases AS p1
WHERE
Deadline BETWEEN DATEADD(day,-@days,GETDATE()) AND GETDATE()
)
SELECT
groupedUsers.FirstUser as UserID1,
groupedUsers.SecondUser as UserID2,
products.ProductID,
products.PurchaseDate
FROM (
SELECT
FirstUser,
SecondUser
FROM
CTE_UserInteractions
GROUP BY
FirstUser, SecondUser
HAVING
COUNT(*) > @Threshold
) groupedUsers
CROSS APPLY (
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p1
WHERE
p1.UserID1 = FirstUser AND p1.UserID2 = SecondUser
UNION ALL
SELECT
ProductID, PurchaseDate
FROM
Purchases AS p2
WHERE
p2.UserID2 = FirstUser AND p2.UserID1 = SecondUser
) products