在SQL中按条件对连续值进行分组和排序

时间:2019-01-11 04:59:18

标签: sql presto

我有一个表mytable,我想在其中添加两个额外的列

我的目标是将user_idmobile_iddifftime > - 600分组为。该序列必须在created_at(时间戳记)中是连续的,并且具有一定的排名,如果它是相同的用户和移动ID,但又出现{-{1}},则重新开始<-600。将为每个单独的组分配一个增量值。例如:

difftime

将创建

的输出
> mytable
            created_at user_id mobile_id   status difftime
1  2019-01-02 22:01:38 1227604     68409 finished      \\N
2  2019-01-03 04:08:29 1227604     68409 finished     -366
3  2019-01-03 15:16:38 1227604     68409  timeout     -668
4  2019-01-04 00:34:40 1227604     68409   failed     -558
5  2019-01-04 00:27:37 1227605     68453   failed      \\N
6  2019-01-04 00:35:56 1227605     68453 finished       -8
7  2019-01-04 01:39:52 1227605     68453 finished      -63
8  2019-01-04 02:05:53 1227605     68453  timeout      -26
9  2019-01-04 02:17:17 1227605     68453  timeout      -11
10 2019-01-04 16:51:39 1227605     68453  timeout     -874

当我只是尝试分配等级时,以下查询将引发错误:> output created_at user_id mobile_id status difftime group rank 1 2019-01-02 22:01:38 1227604 68409 finished \\N NA NA 2 2019-01-03 04:08:29 1227604 68409 finished -366 1 1 3 2019-01-03 15:16:38 1227604 68409 timeout -668 NA NA 4 2019-01-04 00:34:40 1227604 68409 failed -558 2 1 5 2019-01-04 00:27:37 1227605 68453 failed \\N NA NA 6 2019-01-04 00:35:56 1227605 68453 finished -8 3 1 7 2019-01-04 01:39:52 1227605 68453 finished -63 3 2 8 2019-01-04 02:05:53 1227605 68453 timeout -26 3 3 9 2019-01-04 02:17:17 1227605 68453 timeout -11 3 4 10 2019-01-04 16:51:39 1227605 68453 timeout -874 NA NA

尽管我使用的是Presto SQL,但是这里的任何SQL解决方案都将有助于思考如何重组查询

WHERE clause cannot contain aggregations, window functions or grouping operations

1 个答案:

答案 0 :(得分:1)

要识别组,请对“无效”值进行累加。然后使用dense_rank()分配一个值。

我不知道您的查询与您的问题有什么关系,但是逻辑将是这样的:

select t.*, grp,
       (case when difftime > -600
             then row_number() over (partition by user_id, mobile_id order by created_at)
        end) as rank
from (select t.*,
             dense_rank() over (partition by user_id, mobile_id order by grouping) as grp
      from (select t.*,
                   sum(case when difftime > -600 then 1 else 0 end) over (partition by user_id, mobile_id order by created_at) as grouping
            from t
            ) t
     ) t