以下是我的表格和样本数据
user_id | session_id | time_stamp | source | medium | new_source | new_medium
1 | 1 | 2017-01-01 | google | search
1 | 2 | 2017-01-02 | google | search
1 | 3 | 2017-01-03 | direct | none
2 | 1 | 2017-03-11 | google | search
2 | 2 | 2017-04-21 | direct | none
2 | 3 | 2017-04-22 | google | search
当用户拥有最后一个最大时间戳的直接来源时,我想为每个用户更新新的源和新媒体列。新的来源和新的中值必须是最后的非直接来源和介质。以下是预期结果
user_id | session_id | time_stamp | source | medium | new_source | new_medium
1 | 1 | 2017-01-01 | google | search
1 | 2 | 2017-01-02 | google | search
1 | 3 | 2017-01-03 | direct | none |google | search
2 | 1 | 2017-03-11 | google | search
2 | 2 | 2017-04-21 | direct | none
2 | 3 | 2017-04-22 | google | search
我试过的查询(不工作)
SELECT a.domain_userid,
a.session_id,
a.source,
a.medium,
b.source AS new_source,
b.medium AS new_medium
FROM table a
LEFT JOIN table b ON a.domain_userid = b.domain_userid
LEFT JOIN (SELECT domain_userid,
MAX(time_stamp) as time_stamp
FROM table
WHERE source != 'direct'
GROUP BY domain_userid) AS c ON b.time_stamp = c.time_stamp and
c.user_id=b.user_id
WHERE a.source = 'direct'
任何帮助将不胜感激。
注意:加入同一个表并取最后一个无直接值
答案 0 :(得分:1)
您想使用窗口功能。如果连续两个"直接连接,那么最简单的方法是使用lag()
:
select t.*,
(case when row_number() over (partition by user_id order by time_stamp desc) = 1 and
source = 'direct'
then lag(source) over (partition by user_id order by times_stamp)
else source
end) as new_source,
(case when row_number() over (partition by user_id order by time_stamp desc) = 1 and
source = 'direct'
then lag(medium) over (partition by user_id order by times_stamp)
else medium
end) as new_medium
from t.*;