我正在尝试在SELECT语句中使用group_id创建一列。
作为一个小组,我们将采用每个contact_id和product_code组合,直到找到event = Purchase。然后,如果还有其他事件具有相同的contact_id和product_code组合,则将其分配为另一个组。同样,没有事件=购买的contact_id和product_code组合也将是一个独立的组。
contact_id-product_code组合不止一个,并且数据未在表中排序。
下表和SELECT之后的预期结果
contact_id | product_code | timestamp | event |
------------------------------------------------------------------
contact_1 | product_1 | 2018-11-29 11:11:00.000 | view |
contact_1 | product_1 | 2018-11-29 13:10:00.000 | add |
contact_1 | product_1 | 2018-11-30 10:20:00.000 | purchase |
contact_1 | product_1 | 2018-12-03 10:20:00.000 | mail |
contact_1 | product_1 | 2018-12-03 16:00:00.000 | purchase |
contact_2 | product_2 | 2018-12-05 19:01:00.000 | add |
contact_2 | product_2 | 2018-12-05 19:03:00.000 | purchase |
contact_3 | product_3 | 2018-12-05 19:03:00.000 | view |
contact_4 | product_4 | 2018-11-15 19:03:00.000 | mail |
contact_4 | product_4 | 2018-11-15 19:03:00.000 | purchase |
contact_5 | product_5 | 2018-11-20 19:03:00.000 | purchase |
结果:
contact_id | product_code | timestamp | event | id_groups|
-----------------------------------------------------------------------------
contact_1 | product_1 | 2018-11-29 11:11:00.000 | view | 1 |
contact_1 | product_1 | 2018-11-29 13:10:00.000 | add | 1 |
contact_1 | product_1 | 2018-11-30 10:20:00.000 | purchase | 1 |
contact_1 | product_1 | 2018-12-03 10:20:00.000 | mail | 2 |
contact_1 | product_1 | 2018-12-03 16:00:00.000 | purchase | 2 |
contact_2 | product_2 | 2018-12-05 19:01:00.000 | add | 3 |
contact_2 | product_2 | 2018-12-05 19:03:00.000 | purchase | 3 |
contact_3 | product_3 | 2018-12-05 19:03:00.000 | view | 4 |
contact_4 | product_4 | 2018-11-15 19:03:00.000 | mail | 5 |
contact_4 | product_4 | 2018-11-15 19:03:00.000 | purchase | 5 |
contact_5 | product_5 | 2018-11-20 19:03:00.000 | purchase | 6 |
答案 0 :(得分:2)
具有一个相关子查询,该子查询以较早的时间戳计算购买的行数,并加1。
select t1.*,
(select count(*) + 1 from tablename t2
where t2.event = 'purchase' and t2.timestamp < t1.timestamp) as id_groups
from tablename t1
符合核心ANSI SQL。
答案 1 :(得分:1)
您可以使用累计金额执行此操作。对于仅一项任务:
select t.*,
coalesce(sum(case when event = 'purchase' then 1 else 0 end) over
(order by contact_id, product_code, timestamp desc
rows between unbounded preceding and 1 preceding
), 1) as grp
from t;
Here是db <>小提琴。
编辑:
我承认我根本无法理解为什么逻辑有用,因为首先是按产品然后按时间分配组。很奇怪。
以下分配组:
select t.*,
sum(case when prev_event = 'purchase' or seqnum = 1 then 1 else 0 end) over
(order by contact_id, product_code, timestamp) as grp
from (select t.*,
row_number() over (partition by contact_id, product_code order by timestamp) as seqnum,
lag(event) over (order by contact_id, product_code, timestamp) as prev_event
from t
) t
order by 1, 2, 3;
您可以使用dense_rank()
分配所需的序列号:
select t.*, dense_rank() over (order by _grp) as grp
from (select t.*,
sum(case when prev_event = 'purchase' or seqnum = 1 then 1 else 0 end) over
(order by contact_id, product_code, timestamp) as _grp
from (select t.*,
row_number() over (partition by contact_id, product_code order by timestamp) as seqnum,
lag(event) over (order by contact_id, product_code, timestamp) as prev_event
from t
) t
) t
order by 1, 2, 3;
Here是db <>小提琴。