我有两个工具的用户输入用户跟踪数据,用户可以在“输入”页面和/或“结果”页面中输入数据。
我想按页面用user_id做平均年龄,即。 avg(age) over (partition by user_id, page_name)
,但在目前的形式中,Inputs
和Results
之间经常会出现重复,所以我希望在平均之前将其清理干净。
(简化)当前形式的片段:
page_name page_type user_id age
Tool 2 Inputs 2174246 53
Tool 2 Inputs 2174246 50
Tool 2 Results 2174246 53
Tool 1 Inputs 2425226 65
Tool 1 Results 2425226 65
Tool 1 Results 2425226 50
Tool 2 Inputs 2427115 50
Tool 2 Results 2427115 55
Tool 1 Results 620071 65
Tool 2 Inputs 2427536 55
以下是我的想法(通过用户ID和工具),但不知道如何编写它:
case when Results age = Inputs age then return Results age
when Results age is not null and Inputs age is null then return Results age
when Inputs age is not null and Results age is null then return Inputs age
when Results age is not null and Inputs age is not null then return each
案例陈述应该照顾所有场景,除非我遗漏了一些东西,导致:
select user_id, page_name, avg(case statement for age) over (partition by user_id, page_name) as age
page_name user_id age
Tool 2 2174246 51.5
Tool 1 2425226 67.5
Tool 2 2427115 52.5
Tool 1 620071 65
Tool 2 2427536 55
数据在Hive中,但SQL也应该在这里工作。
提前感谢您的帮助!
答案 0 :(得分:0)
你似乎想要:
select user_id, page_name, avg(age) as age
from (select distinct user_id, page_num, age
from t
) t
group by user_id, page_num;