SQL选择失效的客户,每天30天的频率

时间:2017-02-08 03:47:38

标签: sql oracle datetime window-functions

目标是在2016日历年的每一天之前的滚动30天内选择尚未购买的不同customer_id的计数。我已在我的数据库中创建了一个日历表以加入。

以下是一个示例表供参考,假设您的客户订单规范化如下:

+-------------+------------+----------+
| customer_id | date       | order_id |
+-------------+------------+----------+
| 123         | 01/25/2016 | 1000     |
+-------------+------------+----------+
| 123         | 04/27/2016 | 1025     |
+-------------+------------+----------+
| 444         | 02/02/2016 | 1010     |
+-------------+------------+----------+
| 521         | 01/23/2016 | 998      |
+-------------+------------+----------+
| 521         | 01/24/2016 | 999      |
+-------------+------------+----------+  

目标输出实际上是2016年每一天有1行的日历,每天计算当天有多少客户“失效”,这意味着他们的最后一次购买时间为30天或更早。那一年。最终输出将如下所示:

+------------+--------------+
| date       | lapsed_count |
+------------+--------------+
| 01/01/2016 | 0            |
+------------+--------------+
| 01/02/2016 | 0            |
+------------+--------------+
| ...        | ...          |
+------------+--------------+
| 03/01/2016 | 12           |
+------------+--------------+
| 03/02/2016 | 9            |
+------------+--------------+
| 03/03/2016 | 7            |
+------------+--------------+  

此数据在2015年不存在,因此2016年1月1日不可能计算失效客户数,因为这是有史以来第一天进行购买。

因此,对于customer_id#123,他们于2016年1月25日和2016年4月27日购买。他们应该有2次失误,因为他们的购买间隔超过30天。一次失效发生在2016年2月24日,另一次失效发生在2016年5月27日 Customer_id#444仅购买一次,因此他们在2016年2月2日02/02/2016之后应该有30天的一次失效计数。
Customer_id#521很棘手,因为他们以1天的频率购买我们将不计算2016年2月3日的首次购买,因此从他们上次购买03/03/2016开始只有一次失效。失效的计数将发生在2016年2月4日(+30天)。

3 个答案:

答案 0 :(得分:2)

如果你有一个日期表,这是一种昂贵的方法:

select date,
       sum(case when prev_date < date - 30 then 1 else 0 end) as lapsed
from (select c.date, o.customer_id, max(o.date) as prev_date
      from calendar c cross join
           (select distinct customer_id from orders) c left join
           orders o
           on o.date <= c.date and o.customer_id = c.customer_id
      group by c.date, o.customer_id
     ) oc
group by date;

对于每个日期/客户对,它确定客户在日期之前进行的最新购买。然后它使用此信息来计算已失效的数据。

说实话,这可能会在少数几个日期很好用,但不是一整年的价值。

答案 1 :(得分:1)

道歉,我第一次没有正确地阅读你的问题。此查询将为您提供所有失误。它接受每个订单并使用分析函数计算下一个订单日期 - 如果差距大于30天,则记录失效

WITH
 cust_orders (customer_id , order_date , order_id   )
 AS
  (SELECT 1, TO_DATE('01/01/2016','DD/MM/YYYY'), 1001 FROM dual UNION ALL
   SELECT 1, TO_DATE('29/01/2016','DD/MM/YYYY'), 1002 FROM dual UNION ALL
   SELECT 1, TO_DATE('01/03/2016','DD/MM/YYYY'), 1003 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/01/2016','DD/MM/YYYY'), 1004 FROM dual UNION ALL
   SELECT 2, TO_DATE('29/01/2016','DD/MM/YYYY'), 1005 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/04/2016','DD/MM/YYYY'), 1006 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/06/2016','DD/MM/YYYY'), 1007 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/08/2016','DD/MM/YYYY'), 1008 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/09/2016','DD/MM/YYYY'), 1009 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/12/2016','DD/MM/YYYY'), 1010 FROM dual UNION ALL
   SELECT 3, TO_DATE('02/12/2016','DD/MM/YYYY'), 1011 FROM dual UNION ALL
   SELECT 3, TO_DATE('03/12/2016','DD/MM/YYYY'), 1012 FROM dual UNION ALL
   SELECT 3, TO_DATE('04/12/2016','DD/MM/YYYY'), 1013 FROM dual UNION ALL
   SELECT 3, TO_DATE('05/12/2016','DD/MM/YYYY'), 1014 FROM dual UNION ALL
   SELECT 3, TO_DATE('06/12/2016','DD/MM/YYYY'), 1015 FROM dual UNION ALL
   SELECT 3, TO_DATE('07/12/2016','DD/MM/YYYY'), 1016 FROM dual 
  )
SELECT
 customer_id
,order_date
,order_id
,next_order_date
,order_date + 30   lapse_date
FROM
 (SELECT
   customer_id
  ,order_date
  ,order_id
  ,LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) next_order_date
  FROM
   cust_orders
 )
WHERE NVL(next_order_date,sysdate) - order_date > 30
;

现在将其加入一组日期并运行COUNT函数(将年份参数输入为YYYY):

WITH
 cust_orders (customer_id , order_date , order_id   )
 AS
  (SELECT 1, TO_DATE('01/01/2016','DD/MM/YYYY'), 1001 FROM dual UNION ALL
   SELECT 1, TO_DATE('29/01/2016','DD/MM/YYYY'), 1002 FROM dual UNION ALL
   SELECT 1, TO_DATE('01/03/2016','DD/MM/YYYY'), 1003 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/01/2016','DD/MM/YYYY'), 1004 FROM dual UNION ALL
   SELECT 2, TO_DATE('29/01/2016','DD/MM/YYYY'), 1005 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/04/2016','DD/MM/YYYY'), 1006 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/06/2016','DD/MM/YYYY'), 1007 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/08/2016','DD/MM/YYYY'), 1008 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/09/2016','DD/MM/YYYY'), 1009 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/12/2016','DD/MM/YYYY'), 1010 FROM dual UNION ALL
   SELECT 3, TO_DATE('02/12/2016','DD/MM/YYYY'), 1011 FROM dual UNION ALL
   SELECT 3, TO_DATE('03/12/2016','DD/MM/YYYY'), 1012 FROM dual UNION ALL
   SELECT 3, TO_DATE('04/12/2016','DD/MM/YYYY'), 1013 FROM dual UNION ALL
   SELECT 3, TO_DATE('05/12/2016','DD/MM/YYYY'), 1014 FROM dual UNION ALL
   SELECT 3, TO_DATE('06/12/2016','DD/MM/YYYY'), 1015 FROM dual UNION ALL
   SELECT 3, TO_DATE('07/12/2016','DD/MM/YYYY'), 1016 FROM dual 
  )
,calendar (date_value)
 AS
 (SELECT TO_DATE('01/01/'||:P_year,'DD/MM/YYYY') + (rownum -1) 
  FROM all_tables
  WHERE rownum < (TO_DATE('31/12/'||:P_year,'DD/MM/YYYY') - TO_DATE('01/01/'||:P_year,'DD/MM/YYYY')) + 2
 )
SELECT
 calendar.date_value
,COUNT(*)
FROM
 (
  SELECT
   customer_id
  ,order_date
  ,order_id
  ,next_order_date
  ,order_date + 30   lapse_date
  FROM
   (SELECT
     customer_id
    ,order_date
    ,order_id
    ,LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) next_order_date
    FROM
     cust_orders
   )
  WHERE NVL(next_order_date,sysdate) - order_date > 30
 )  lapses
,calendar
WHERE 1=1
AND calendar.date_value = TRUNC(lapses.lapse_date)
GROUP BY
 calendar.date_value
;

或者,如果你真的想要打印出每个日期,请使用:

WITH
 cust_orders (customer_id , order_date , order_id   )
 AS
  (SELECT 1, TO_DATE('01/01/2016','DD/MM/YYYY'), 1001 FROM dual UNION ALL
   SELECT 1, TO_DATE('29/01/2016','DD/MM/YYYY'), 1002 FROM dual UNION ALL
   SELECT 1, TO_DATE('01/03/2016','DD/MM/YYYY'), 1003 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/01/2016','DD/MM/YYYY'), 1004 FROM dual UNION ALL
   SELECT 2, TO_DATE('29/01/2016','DD/MM/YYYY'), 1005 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/04/2016','DD/MM/YYYY'), 1006 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/06/2016','DD/MM/YYYY'), 1007 FROM dual UNION ALL
   SELECT 2, TO_DATE('01/08/2016','DD/MM/YYYY'), 1008 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/09/2016','DD/MM/YYYY'), 1009 FROM dual UNION ALL
   SELECT 3, TO_DATE('01/12/2016','DD/MM/YYYY'), 1010 FROM dual UNION ALL
   SELECT 3, TO_DATE('02/12/2016','DD/MM/YYYY'), 1011 FROM dual UNION ALL
   SELECT 3, TO_DATE('03/12/2016','DD/MM/YYYY'), 1012 FROM dual UNION ALL
   SELECT 3, TO_DATE('04/12/2016','DD/MM/YYYY'), 1013 FROM dual UNION ALL
   SELECT 3, TO_DATE('05/12/2016','DD/MM/YYYY'), 1014 FROM dual UNION ALL
   SELECT 3, TO_DATE('06/12/2016','DD/MM/YYYY'), 1015 FROM dual UNION ALL
   SELECT 3, TO_DATE('07/12/2016','DD/MM/YYYY'), 1016 FROM dual 
  )
,lapses
 AS
  (SELECT
    customer_id
   ,order_date
   ,order_id
   ,next_order_date
   ,order_date + 30   lapse_date
   FROM
    (SELECT
      customer_id
     ,order_date
     ,order_id
     ,LEAD(order_date) OVER (PARTITION BY customer_id ORDER BY order_date) next_order_date
     FROM
      cust_orders
    )
   WHERE NVL(next_order_date,sysdate) - order_date > 30
  )  
,calendar (date_value)
 AS
 (SELECT TO_DATE('01/01/'||:P_year,'DD/MM/YYYY') + (rownum -1) 
  FROM all_tables
  WHERE rownum < (TO_DATE('31/12/'||:P_year,'DD/MM/YYYY') - TO_DATE('01/01/'||:P_year,'DD/MM/YYYY')) + 2
 )
SELECT
 calendar.date_value
,(SELECT COUNT(*)
  FROM lapses
  WHERE calendar.date_value = lapses.lapse_date
 )
FROM
 calendar
WHERE 1=1
ORDER BY
 calendar.date_value
;

答案 2 :(得分:1)

以下是我的表现:

WITH your_table AS (SELECT 123 customer_id, to_date('24/01/2016', 'dd/mm/yyyy') order_date, 12345 order_id FROM dual UNION ALL
                    SELECT 123 customer_id, to_date('24/01/2016', 'dd/mm/yyyy') order_date, 12346 order_id FROM dual UNION ALL
                    SELECT 123 customer_id, to_date('25/01/2016', 'dd/mm/yyyy') order_date, 12347 order_id FROM dual UNION ALL
                    SELECT 123 customer_id, to_date('24/02/2016', 'dd/mm/yyyy') order_date, 12347 order_id FROM dual UNION ALL
                    SELECT 123 customer_id, to_date('16/03/2016', 'dd/mm/yyyy') order_date, 12348 order_id FROM dual UNION ALL
                    SELECT 123 customer_id, to_date('18/04/2016', 'dd/mm/yyyy') order_date, 12349 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('20/02/2016', 'dd/mm/yyyy') order_date, 12350 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('01/03/2016', 'dd/mm/yyyy') order_date, 12351 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('03/03/2016', 'dd/mm/yyyy') order_date, 12352 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('18/04/2016', 'dd/mm/yyyy') order_date, 12353 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('20/05/2016', 'dd/mm/yyyy') order_date, 12354 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('23/06/2016', 'dd/mm/yyyy') order_date, 12355 order_id FROM dual UNION ALL
                    SELECT 456 customer_id, to_date('19/01/2017', 'dd/mm/yyyy') order_date, 12356 order_id FROM dual),
-- end of mimicking your_table with data in it
    lapsed_info AS (SELECT customer_id,
                           order_date,
                           CASE WHEN TRUNC(SYSDATE) - order_date <= 30 THEN NULL
                                WHEN COUNT(*) OVER (PARTITION BY customer_id ORDER BY order_date RANGE BETWEEN 1 FOLLOWING AND 30 FOLLOWING) = 0 THEN order_date+30
                                ELSE NULL
                           END lapsed_date
                    FROM   your_table),
          dates AS (SELECT to_date('01/01/2016', 'dd/mm/yyyy') + LEVEL -1 dt
                    FROM   dual
                    CONNECT BY to_date('01/01/2016', 'dd/mm/yyyy') + LEVEL -1 <= TRUNC(SYSDATE))
SELECT dates.dt,
       COUNT(li.lapsed_date) lapsed_count
FROM   dates
       LEFT OUTER JOIN lapsed_info li ON dates.dt = li.lapsed_date
GROUP BY dates.dt
ORDER BY dates.dt;

结果:

DT         LAPSED_COUNT
---------- ------------
01/01/2016            0
<snip>
23/01/2016            0
24/01/2016            0
25/01/2016            0
26/01/2016            0
<snip>
19/02/2016            0
20/02/2016            0
21/02/2016            0
22/02/2016            0
23/02/2016            0
24/02/2016            1
25/02/2016            0
<snip>
29/02/2016            0
01/03/2016            0
02/03/2016            0
03/03/2016            0
04/03/2016            0
<snip>
15/03/2016            0
16/03/2016            0
17/03/2016            0
<snip>
20/03/2016            0
21/03/2016            0
22/03/2016            0
<snip>
30/03/2016            0
31/03/2016            0
01/04/2016            0
02/04/2016            1
03/04/2016            0
<snip>
14/04/2016            0
15/04/2016            1
16/04/2016            0
17/04/2016            0
18/04/2016            0
19/04/2016            0
<snip>
17/05/2016            0
18/05/2016            2
19/05/2016            0
20/05/2016            0
21/05/2016            0
<snip>
18/06/2016            0
19/06/2016            1
20/06/2016            0
21/06/2016            0
22/06/2016            0
23/06/2016            0
24/06/2016            0
<snip>
22/07/2016            0
23/07/2016            1
24/07/2016            0
<snip>
18/01/2017            0
19/01/2017            0
20/01/2017            0
<snip>
08/02/2017            0

这将获取您的数据,并使用分析计数功能计算出当前行日期(但不包括)30天内具有值的行数。

然后我们应用案例表达式来确定如果该行的日期在今天的30天之内,我们将计算那些未过期的日期。如果返回0,则该行被视为已失效,我们将输出失效日期作为order_date加30天。任何其他计数结果表示该行尚未过期。

以上内容均在lapsed_info子查询中得出。

然后我们需要做的就是列出日期(参见dates子查询),然后根据lapsed_date将lapsed_info子查询加入其中,然后计算每天的失效日期。