聚合时按时间划分SQL

时间:2016-07-06 23:15:22

标签: sql

处理相当复杂的SQL语句,并且在跨用户聚合时不会获得最多的prop_list计数。以下是我的数据集示例:

user_id, term_id, time_stamp, prop_list
u100, t10, 7:00, (a,b,c)
u100, t10, 7:01, (a,b)
u100, t11, 7:01, (a,b)
u101, t10, 7:00, (a,b,c)
u101, t10, 7:01, (a)
u102, t10, 6:59, (a)

期望的输出:

term_id, term_id_distinct_count, prop_list
t10, 3, (a,b,c)
t11, 1, (a,b)

这是我目前的代码:

select 
    a.term_id,
    count(distinct user_id) as term_id_distinct_count,
    a.prop_list
from 
    (select 
         user_id, term_id,
         prop_list,
         row_number() over(partition by user_id, term_id order by time_stamp asc) as row_no
     from 
         data_table
     group ) a
where 
    a.row_no = 1;

请注意,当user_id有多个term_id时,我们只想使用先发生的那个,这就是我按时间戳asc排序的原因。

1 个答案:

答案 0 :(得分:0)

大多数支持窗口函数的数据库都支持count(distinct)作为窗口函数,因此您可以这样做:

select a.term_id, term_id_distinct_count, a.prop_list
from (select user_id, term_id, prop_list,
             row_number() over (partition by term_id order by time_stamp asc) as seqnum,
             count(distinct user_id) over (partition by term_id) as term_id_distinct_count
      from data_table
     ) a
where seqnum = 1;