Question

我正在尝试在SELECT语句中使用group_id创建一列。

作为一个小组，我们将采用每个contact_id和product_code组合，直到找到event = Purchase。然后，如果还有其他事件具有相同的contact_id和product_code组合，则将其分配为另一个组。同样，没有事件=购买的contact_id和product_code组合也将是一个独立的组。

contact_id-product_code组合不止一个，并且数据未在表中排序。

下表和SELECT之后的预期结果

contact_id | product_code | timestamp               | event      |
------------------------------------------------------------------
contact_1  | product_1    | 2018-11-29 11:11:00.000 |   view     |
contact_1  | product_1    | 2018-11-29 13:10:00.000 |   add      |
contact_1  | product_1    | 2018-11-30 10:20:00.000 |   purchase |
contact_1  | product_1    | 2018-12-03 10:20:00.000 |   mail     |
contact_1  | product_1    | 2018-12-03 16:00:00.000 |   purchase |
contact_2  | product_2    | 2018-12-05 19:01:00.000 |   add      |
contact_2  | product_2    | 2018-12-05 19:03:00.000 |   purchase |
contact_3  | product_3    | 2018-12-05 19:03:00.000 |   view     |
contact_4  | product_4    | 2018-11-15 19:03:00.000 |   mail     |
contact_4  | product_4    | 2018-11-15 19:03:00.000 |   purchase |
contact_5  | product_5    | 2018-11-20 19:03:00.000 |   purchase |

结果：

contact_id | product_code | timestamp               | event      | id_groups|
-----------------------------------------------------------------------------
contact_1  | product_1    | 2018-11-29 11:11:00.000 |   view     |    1     |
contact_1  | product_1    | 2018-11-29 13:10:00.000 |   add      |    1     |
contact_1  | product_1    | 2018-11-30 10:20:00.000 |   purchase |    1     |
contact_1  | product_1    | 2018-12-03 10:20:00.000 |   mail     |    2     |
contact_1  | product_1    | 2018-12-03 16:00:00.000 |   purchase |    2     |
contact_2  | product_2    | 2018-12-05 19:01:00.000 |   add      |    3     |
contact_2  | product_2    | 2018-12-05 19:03:00.000 |   purchase |    3     |
contact_3  | product_3    | 2018-12-05 19:03:00.000 |   view     |    4     |
contact_4  | product_4    | 2018-11-15 19:03:00.000 |   mail     |    5     |
contact_4  | product_4    | 2018-11-15 19:03:00.000 |   purchase |    5     |
contact_5  | product_5    | 2018-11-20 19:03:00.000 |   purchase |    6     |

Answer 1

具有一个相关子查询，该子查询以较早的时间戳计算购买的行数，并加1。

select t1.*,
      (select count(*) + 1 from tablename t2
       where t2.event = 'purchase' and t2.timestamp < t1.timestamp) as id_groups
from tablename t1

符合核心ANSI SQL。

Answer 2

您可以使用累计金额执行此操作。对于仅一项任务：

select t.*,
       coalesce(sum(case when event = 'purchase' then 1 else 0 end) over
                    (order by contact_id, product_code, timestamp desc
                     rows between unbounded preceding and 1 preceding
                    ), 1) as grp
from t;

Here是db <>小提琴。

编辑：

我承认我根本无法理解为什么逻辑有用，因为首先是按产品然后按时间分配组。很奇怪。

以下分配组：

select t.*,
       sum(case when prev_event = 'purchase' or seqnum = 1 then 1 else 0 end) over
                (order by contact_id, product_code, timestamp) as grp
from (select t.*,
             row_number() over (partition by contact_id, product_code order by timestamp) as seqnum,
             lag(event) over (order by contact_id, product_code, timestamp) as prev_event
      from t
     ) t
order by 1, 2, 3;

您可以使用dense_rank()分配所需的序列号：

select t.*, dense_rank() over (order by _grp) as grp
from (select t.*,
             sum(case when prev_event = 'purchase' or seqnum = 1 then 1 else 0 end) over
                      (order by contact_id, product_code, timestamp) as _grp
      from (select t.*,
                   row_number() over (partition by contact_id, product_code order by timestamp) as seqnum,
                   lag(event) over (order by contact_id, product_code, timestamp) as prev_event
            from t
           ) t
     ) t
order by 1, 2, 3;

Here是db <>小提琴。

SQL在SELECT语句中使用group_id创建列

2 个答案: