希望有人以前曾遇到此问题,并已找到解决方案。 我试图找到根据订阅期限而不是单个订购日期失效的客户。 我们将失效定义为在订阅终止后30天内不进行购买/续订。客户可以同时拥有多个订阅,并且订阅的长度可以不同。 我有一个数据集,其中包括客户ID,订单,订阅开始日期,订阅到期日期,以及该订单在客户的订单历史记录中的排名,如下所示:
CREATE TABLE #Subscriptions
(CustomerID INT,
Orderid INT,
SubscriptionStart DATE,
SubscriptionEnd DATE,
OrderNumber INT);
INSERT INTO #Subscriptions
VALUES(1, 111111, '2017-01-01', '2017-12-31', 1),
(1, 211111, '2018-01-01', '2019-12-31' ,2),
(1, 311121, '2018-10-01', '2018-10-02', 3),
(1, 451515, '2019-02-01', '2019-02-28', 4),
(2, 158797, '2018-07-01', '2018-07-31', 1),
(2, 287584, '2018-09-01', '2018-12-31', 2),
(2, 387452, '2019-01-01', '2019-01-31', 3),
(3, 187498, '2019-01-01', '2019-02-28', 1),
(3, 284990, '2019-02-01', '2019-02-28', 2),
(4, 184849, '2019-02-01', '2019-02-28', 1)
在此数据集中,客户2将于2018-07-31过期。由于客户1的订阅时间为2017年1月1日-2017年12月31日,然后从2018年1月1日开始,到2019年12月31日结束,因此即使客户做出其他订单,订阅也不会在该时间段内失效将有资格。
我尝试使用LEAD()和LAG()进行一些简单的缺口计算,但是由于订阅期的长度可变(单个订阅可以跨越多个其他订单),我没有成功。最终,我们将用它来计算大约500万条记录中的每月流失率。
答案 0 :(得分:0)
您对尝试使用LEAD()和LAG()的想法太想了。您需要的只是WHERE子句中的NOT EXISTS()函数
在伪代码中:
SELECT...FROM...
WHERE {SubscriptionEnd is at least 30 days in the past}
AND NOT EXISTS(
{A row for the same Customer where the StartDate is 30 days or less after this EndDate}
)
答案 1 :(得分:0)
这看起来很棘手。您对使用LEAD()和LAG()函数的问题是正确的。这源于客户能够拥有多个长度可变的订阅。因此,我们需要首先处理该问题。让我们从创建一个日期列表开始,而不是创建一个SubscriptionStart和SubscriptionEnd列表。
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity ActivityDate
----------- ----------- ----------- ------------
1 111111 1 2017-01-01
1 111111 -1 2017-12-31
1 211111 1 2018-01-01
1 311121 1 2018-10-01
1 311121 -1 2018-10-02
1 451515 1 2019-02-01
1 451515 -1 2019-02-28
1 211111 -1 2019-12-31
2 158797 1 2018-07-01
2 158797 -1 2018-07-31
2 287584 1 2018-09-01
2 287584 -1 2018-12-31
2 387452 1 2019-01-01
2 387452 -1 2019-01-31
3 187498 1 2019-01-01
3 284990 1 2019-02-01
3 187498 -1 2019-02-28
3 284990 -1 2019-02-28
4 184849 1 2019-02-01
4 184849 -1 2019-02-28
注意其他“活动”字段。对于SubscriptionStart为1,对于SubscriptionEnd为-1。
使用此新的“活动”字段,可以查找客户的订阅中可能出现失效的地方。同时使用LEAD()查找NextDate。
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate,
DATEDIFF(d, ActivityDate, LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate)) AS LapsedDays
FROM
SubscriptionList
ORDER BY
CustomerId,
ActivityDate
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
1 111111 1 1 2017-01-01 2017-12-31 364
1 111111 -1 0 2017-12-31 2018-01-01 1
1 211111 1 1 2018-01-01 2018-10-01 273
1 311121 1 2 2018-10-01 2018-10-02 1
1 311121 -1 1 2018-10-02 2019-02-01 122
1 451515 1 2 2019-02-01 2019-02-28 27
1 451515 -1 1 2019-02-28 2019-12-31 306
1 211111 -1 0 2019-12-31 2019-02-28 -306
2 158797 1 1 2018-07-01 2018-07-31 30
2 158797 -1 0 2018-07-31 2018-09-01 32
2 287584 1 1 2018-09-01 2018-12-31 121
2 287584 -1 0 2018-12-31 2019-01-01 1
2 387452 1 1 2019-01-01 2019-01-31 30
2 387452 -1 0 2019-01-31 2019-02-28 28
3 187498 1 1 2019-01-01 2019-02-01 31
3 284990 1 2 2019-02-01 2019-02-28 27
3 187498 -1 1 2019-02-28 2019-02-28 0
3 284990 -1 0 2019-02-28 2019-02-28 0
4 184849 1 1 2019-02-01 2019-02-28 27
4 184849 -1 0 2019-02-28 2019-02-28 0
在“活动”字段中添加运行总计将有效地提供活动订阅的数量。当它大于0时,将无法经过。因此,请专注于SubscriptionCount为零的行。
使用LEAD()获取NextDate。如果没有下一个日期,则默认为今天。如果SubscriptionCount为0,则NextDate必须来自新的订阅,而NextDate将是新订阅开始的日期。如果DATEDIFF大于30天,则使用DATEDIFF计算SubscriptionEnd和SubscriptionBegin之间的天数。听起来像是一个很好的WHERE声明。
;WITH SubscriptionList AS (
SELECT
CustomerId,
OrderId,
1 AS Activity,
SubscriptionStart AS ActivityDate
FROM
#Subscriptions
UNION ALL
SELECT
CustomerId,
OrderId,
-1 AS Activity,
SubscriptionEnd AS ActivityDate
FROM
#Subscriptions
)
, FindLapse AS (
SELECT
CustomerId,
OrderId,
Activity,
SUM(Activity) OVER(PARTITION BY CustomerId ORDER BY ActivityDate ROWS UNBOUNDED PRECEDING) as SubscriptionCount,
ActivityDate,
LEAD(ActivityDate, 1, GETDATE()) OVER(PARTITION BY CustomerId ORDER BY ActivityDate) AS NextDate
FROM
SubscriptionList
)
SELECT
CustomerId,
OrderId,
Activity,
SubscriptionCount,
ActivityDate,
NextDate,
DATEDIFF(d, ActivityDate, NextDate) AS LapsedDays
FROM
FindLapse
WHERE
SubscriptionCount = 0
AND DATEDIFF(d, ActivityDate, NextDate) >= 30
CustomerId OrderId Activity SubscriptionCount ActivityDate NextDate LapsedDays
----------- ----------- ----------- ----------------- ------------ ---------- -----------
2 158797 -1 0 2018-07-31 2018-09-01 32
好像我们有赢家!