ROW_NUMBER()个,条件为大

时间:2018-12-06 00:17:55

标签: sql google-bigquery standard-sql

我非常感谢您的帮助。 购买旅行团时,我​​有一组数据。每个游览都有一个Purchaser_Email和Event_Date以及其他不相关的列。 我想进行一次专栏旅行,以确定该事件是新旅行还是同一旅行。 要将新购买的商品标识为新行程,两个Event_Date之间的差额必须超过30天。如果不是,那次旅行被视为同一次旅行。最后,我需要知道客户进行了多少次旅行,并按旅行将购买分组。 我用ROW()NUMBER进行查询,并计算第一次购买和下一次购买之间的date_diff。我觉得我已经很近了,但是我需要一些帮助来添加“旅行专栏”。

我需要这样的东西: Desired Colum

在此文件中是示例数据集和我需要的列:https://docs.google.com/spreadsheets/d/1ToNFQ9l2-ztDrN2zSlKlgBQk95vO6BnRv6VabWrHBmM/edit?usp=sharing RAW数据是第一个标签, 在第二个选项卡中,下面的查询结果为橙色列,红色的最后一列是我要查找的列。

Lazy<Entries>

1 个答案:

答案 0 :(得分:2)

您正在做的对。在对row_number()rank()进行分区和分配后,您可以根据两次购买存在一定增量的购买条件来分配布尔参数。

这是实现此目标的一种方法:

with data as (
  select purchaser_email, event_date, rank() over (partition by purchaser_email order by event_date) as indx from (
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-10-15') as event_date union all
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-10-12') as event_date union all
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-10-19') as event_date union all
    select 'fgh_xyz@xyz.com' as purchaser_email, date('2018-10-03') as event_date union all
    select 'fgh_xyz@xyz.com' as purchaser_email, date('2018-10-10') as event_date union all
    select 'fgh_xyz@xyz.com' as purchaser_email, date('2018-11-26') as event_date union all
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-11-28') as event_date union all
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-12-30') as event_date union all
    select 'abc_xyz@xyz.com' as purchaser_email, date('2018-12-31') as event_date
  )
)
select purchaser_email, count(1) as order_count from (
  select purchaser_email, 
    d1, new_purchase, sum(case when new_purchase=true then 1 else 0 end) over (partition by purchaser_email order by d1) as purchase_count from (
    select 
      t1.purchaser_email, 
      t1.event_date as d1, 
      t2.event_date as d2, 
      t1.indx as t1i,
      t2.indx as t2i,
      case 
        when t2.event_date is null then true 
        when abs(date_diff(t1.event_date, t2.event_date, day)) >= 30 then true 
        else false end as new_purchase
      from data t1
      left join data t2 on t1.purchaser_email = t2.purchaser_email and t1.indx-1 = t2.indx
  )
  order by 1,2,3
)
where new_purchase = true
group by 1
order by 1