SQL-根据每个唯一ID的滚动窗口获取计数

时间:2020-03-25 23:24:47

标签: sql google-bigquery rolling-computation

我正在使用具有iddate列的表。对于每个ID,都有一个90天的窗口,可以进行多次交易。 90天窗口在进行第一笔交易时开始,并在90天结束后重置时钟。当新的交易触发新的90天窗口时,我想从头开始计数。我想用SQL中的两个附加列(windowcount)来生成这样的内容:

id      date        window  count   
name1   7/7/2019    first   1
name1   12/31/2019  second  1
name1   1/23/2020   second  2
name1   1/23/2020   second  3
name1   2/12/2020   second  4 
name1   4/1/2020    third   1
name2   6/30/2019   first   1
name2   8/14/2019   first   2 

我认为可以使用CASE语句和MIN(date) OVER (PARTITION BY id)来获得窗口的排名。这就是我要记住的:

CASE WHEN MIN(date) OVER (PARTITION BY id) THEN 'first' 
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 90 THEN 'first'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 90 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 180 THEN 'third'
WHEN DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) > 180 AND DATEDIFF(day, date, MIN(date) OVER (PARTITION BY id)) <= 270 THEN 'fourth' 
ELSE NULL END

在窗口中增加计数是ROW_NUMBER() OVER (PARTITION BY id, window)

2 个答案:

答案 0 :(得分:1)

仅靠窗口功能不能解决此问题。您需要遍历数据集,这可以通过递归查询完成:

with 
    tab as (
        select t.*, row_number() over(partition by id order by date) rn
        from mytable t
    )
    cte as (
        select id, date, rn, date date0 from tab where rn = 1
        union all
        select t.id, t.date, t.rn, greatest(t.date, c.date + interval '90' day)
        from cte c
        inner join tab t on t.id = c.id and t.rn = c.rn + 1
    )
select
    id,
    date,
    dense_rank() over(partition by id order by date0) grp,
    count(*)     over(partition by id order by date0, date) cnt
from cte

with子句中的第一个查询通过增加id对具有相同date的记录进行排名;然后,递归查询遍历数据集并计算每个组的开始日期。最后一步是对组编号并计算窗口计数。

答案 1 :(得分:1)

GMB完全正确,需要递归CTE。我提供这种替代形式有两个原因。首先,因为它使用SQL Server语法,这似乎是问题中使用的数据库。其次,因为它直接计算windowcount而没有窗口函数:

with t as (
      select t.*, row_number() over (partition by id order by date) as seqnum
      from tbl t
     ),
     cte as (
      select t.id, t.date, dateadd(day, 90, t.date) as window_end, 1 as window, 1 as count, seqnum
      from t
      where seqnum = 1
      union all
      select t.id, t.date,
             (case when t.date > cte.window_end then dateadd(day, 90, t.date)
                   else cte.window_end
              end) as window_end,
             (case when t.date > cte.window_end then window + 1 else window end) as window,
             (case when t.date > cte.window_end then 1 else cte.count + 1 end) as count,
             t.seqnum
      from cte join
           t
           on t.id = cte.id and
              t.seqnum = cte.seqnum + 1
     )
select id, date, window, count
from cte
order by 1, 2;

Here是db <>小提琴。