计算值在Hive / SQL中连续出现的次数

时间:2016-12-07 08:43:15

标签: sql hive hiveql

我的桌子上有3列。对于每个用户ID,我想按时间排序value等于B的次数。类似于具有相同值的最长子列表。例如,下面的数据

time userid value 2016-01-01 1 A 2016-01-02 1 B 2016-01-03 1 B 2016-01-04 2 C 2016-01-05 2 B 2016-01-06 2 B 2016-01-07 2 B 2016-01-08 2 C 2016-01-09 2 B

将返回

userid times 1 2 2 3

在Hive中没有用户定义的功能,这是否可行?我已经挖掘了LAGLEAD,但无法找到方法。 :(

1 个答案:

答案 0 :(得分:1)

select      value
           ,userid               
           ,max (times) as times


from       (select      value
                       ,userid
                       ,count (*)   as times

            from       (select  value
                               ,userid

                               ,row_number () over 
                                (
                                     partition by userid       
                                     order by     time
                                ) as rn

                               ,row_number () over 
                                (
                                    partition by userid,value 
                                    order by     time
                                ) as rn_val

                        from    t

                     -- where   value = 'B'
                        ) t

            group by    value
                       ,userid  
                       ,rn - rn_val 
            ) t

group by    value
           ,userid  

order by    value
           ,userid 
;