如何在不同的列中使用COLLECT_SET和条件分组

时间:2017-05-18 17:25:07

标签: hadoop hive hiveql

我有这张桌子:

╔═════════╦═════════╦══════════════╗
║ user_id ║ item_id ║ date_visited ║
╠═════════╬═════════╬══════════════╣
║ 1       ║ 123     ║ 18/5/2017    ║
║ 1       ║ 234     ║ 11/3/2017    ║
║ 2       ║ 345     ║ 18/5/2017    ║
║ 2       ║ 456     ║ 11/3/2017    ║
╚═════════╩═════════╩══════════════╝

我想要实现的目标(通过Hive查询)就是这个结果(假设今天是2017年5月18日):

╔═════════╦═══════════════════════════╦═════════════════════════════╗
║ user_id ║ items_visited_last_5_days ║ items_visited_last_100_days ║
╠═════════╬═══════════════════════════╬═════════════════════════════╣
║ 1       ║ 123                       ║ 123, 234                    ║
║ 2       ║ 345                       ║ 345, 456                    ║
╚═════════╩═══════════════════════════╩═════════════════════════════╝

基本上,我需要按user_id进行分组,并根据用户的访问次数生成不同的列(基于时间间隔)(连接的item_id)。是否有可能实现这一目标?

提前谢谢。

1 个答案:

答案 0 :(得分:3)

select      user_id
           ,collect_set (case when datediff(current_date,date_visited) <= 5   then item_id end) as items_visited_last_5_days
           ,collect_set (case when datediff(current_date,date_visited) <= 100 then item_id end) as items_visited_last_100_days

from        mytable

group by    user_id
+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
|       1 | [123]                     | [123,234]                   |
|       2 | [345]                     | [345,456]                   |
+---------+---------------------------+-----------------------------+

select      user_id
           ,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 5   then cast (item_id as string) end)) as items_visited_last_5_days
           ,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 100 then cast (item_id as string) end)) as items_visited_last_100_days

from        mytable

group by    user_id
+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
|       1 |                       123 | 123,234                     |
|       2 |                       345 | 345,456                     |
+---------+---------------------------+-----------------------------+