自我加入聚合函数

时间:2017-07-14 03:10:28

标签: sql

以下是我的表格和样本数据

user_id |   session_id  |   time_stamp  |   source  |   medium  |   new_source  |   new_medium

1       |   1           |   2017-01-01  |   google  |   search
1       |   2           |   2017-01-02  |   google  |   search
1       |   3           |   2017-01-03  |   direct  |   none



2       |   1           |   2017-03-11  |   google  |   search
2       |   2           |   2017-04-21  |   direct  |   none
2       |   3           |   2017-04-22  |   google  |   search

当用户拥有最后一个最大时间戳的直接来源时,我想为每个用户更新新的源和新媒体列。新的来源和新的中值必须是最后的非直接来源和介质。以下是预期结果

user_id |   session_id  |   time_stamp  |   source  |   medium  | new_source    |   new_medium

1       |   1           |   2017-01-01  |   google  |   search
1       |   2           |   2017-01-02  |   google  |   search
1       |   3           |   2017-01-03  |   direct  |   none    |google     |   search



2       |   1           |   2017-03-11  |   google  |   search
2       |   2           |   2017-04-21  |   direct  |   none
2       |   3           |   2017-04-22  |   google  |   search

我试过的查询(不工作)

SELECT a.domain_userid,
   a.session_id,
   a.source,
   a.medium,
   b.source AS new_source,
   b.medium AS new_medium
FROM table a
  LEFT JOIN table b ON a.domain_userid = b.domain_userid
  LEFT JOIN (SELECT domain_userid,
           MAX(time_stamp) as time_stamp
    FROM table
    WHERE source != 'direct'
    GROUP BY domain_userid) AS c ON b.time_stamp = c.time_stamp and 
c.user_id=b.user_id
WHERE a.source = 'direct'

任何帮助将不胜感激。

注意:加入同一个表并取最后一个无直接值

1 个答案:

答案 0 :(得分:1)

您想使用窗口功能。如果连续两个"直接连接,那么最简单的方法是使用lag()

select t.*,
       (case when row_number() over (partition by user_id order by time_stamp desc) = 1 and
                  source = 'direct'
             then lag(source) over (partition by user_id order by times_stamp)
             else source
        end) as new_source,
       (case when row_number() over (partition by user_id order by time_stamp desc) = 1 and
                  source = 'direct'
             then lag(medium) over (partition by user_id order by times_stamp)
             else medium
        end) as new_medium
from t.*;