我有这张桌子:
╔═════════╦═════════╦══════════════╗
║ user_id ║ item_id ║ date_visited ║
╠═════════╬═════════╬══════════════╣
║ 1 ║ 123 ║ 18/5/2017 ║
║ 1 ║ 234 ║ 11/3/2017 ║
║ 2 ║ 345 ║ 18/5/2017 ║
║ 2 ║ 456 ║ 11/3/2017 ║
╚═════════╩═════════╩══════════════╝
我想要实现的目标(通过Hive查询)就是这个结果(假设今天是2017年5月18日):
╔═════════╦═══════════════════════════╦═════════════════════════════╗
║ user_id ║ items_visited_last_5_days ║ items_visited_last_100_days ║
╠═════════╬═══════════════════════════╬═════════════════════════════╣
║ 1 ║ 123 ║ 123, 234 ║
║ 2 ║ 345 ║ 345, 456 ║
╚═════════╩═══════════════════════════╩═════════════════════════════╝
基本上,我需要按user_id进行分组,并根据用户的访问次数生成不同的列(基于时间间隔)(连接的item_id)。是否有可能实现这一目标?
提前谢谢。
答案 0 :(得分:3)
select user_id
,collect_set (case when datediff(current_date,date_visited) <= 5 then item_id end) as items_visited_last_5_days
,collect_set (case when datediff(current_date,date_visited) <= 100 then item_id end) as items_visited_last_100_days
from mytable
group by user_id
+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
| 1 | [123] | [123,234] |
| 2 | [345] | [345,456] |
+---------+---------------------------+-----------------------------+
或
select user_id
,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 5 then cast (item_id as string) end)) as items_visited_last_5_days
,concat_ws (',',collect_set (case when datediff(current_date,date_visited) <= 100 then cast (item_id as string) end)) as items_visited_last_100_days
from mytable
group by user_id
+---------+---------------------------+-----------------------------+
| user_id | items_visited_last_5_days | items_visited_last_100_days |
+---------+---------------------------+-----------------------------+
| 1 | 123 | 123,234 |
| 2 | 345 | 345,456 |
+---------+---------------------------+-----------------------------+