困难的案例陈述制定

时间:2017-11-28 20:06:43

标签: sql hive

我有两个工具的用户输入用户跟踪数据,用户可以在“输入”页面和/或“结果”页面中输入数据。

我想按页面用user_id做平均年龄,即。 avg(age) over (partition by user_id, page_name),但在目前的形式中,InputsResults之间经常会出现重复,所以我希望在平均之前将其清理干净。

(简化)当前形式的片段:

page_name   page_type   user_id   age
Tool 2      Inputs      2174246   53
Tool 2      Inputs      2174246   50
Tool 2      Results     2174246   53
Tool 1      Inputs      2425226   65
Tool 1      Results     2425226   65
Tool 1      Results     2425226   50
Tool 2      Inputs      2427115   50
Tool 2      Results     2427115   55
Tool 1      Results     620071    65
Tool 2      Inputs      2427536   55

以下是我的想法(通过用户ID和工具),但不知道如何编写它:

case when Results age = Inputs age then return Results age  
when Results age is not null and Inputs age is null then return Results age          
when Inputs age is not null and Results age is null then return Inputs age       
when Results age is not null and Inputs age is not null then return each

案例陈述应该照顾所有场景,除非我遗漏了一些东西,导致:

select user_id, page_name, avg(case statement for age) over (partition by user_id, page_name) as age

page_name   user_id   age
Tool 2      2174246   51.5
Tool 1      2425226   67.5
Tool 2      2427115   52.5
Tool 1      620071    65
Tool 2      2427536   55

数据在Hive中,但SQL也应该在这里工作。

提前感谢您的帮助!

1 个答案:

答案 0 :(得分:0)

你似乎想要:

select user_id, page_name, avg(age) as age
from (select distinct user_id, page_num, age
      from t
     ) t
group by user_id, page_num;