将rank与get_json_object一起使用

时间:2014-06-11 18:20:43

标签: json hive hiveql

我想使用get_json_object读取JSON数据,但是按方案在组中添加rownumber。

我试过了;

这是在添加rownumber / rank

之前的选择
select get_json_object(json_data.fullrow, '$.event.type\[0]') as player_event,
get_json_object(json_data.fullrow, '$.stringDate\[0]') as date,
get_json_object(json_data.fullrow, '$.sessionID\[0]') as user_session
from json_data;

我想按照这些行向user_session分组的player_event添加一个行号。

player_event,user_session,date,rank
START,1,010114,1
MIDDLE,1,010114,2
FINISH,1,010114,3
START,2,010114,1
FINISH,2,010114,2


SELECT 
get_json_object(json_data.fullrow, '$.event.type\[0]') as player_event, 
get_json_object(json_data.fullrow, '$.sessionID\[0]') as user_session,
get_json_object(json_data.fullrow, '$.stringDate\[0]') as date,
rank() over (PARTITION BY get_json_object(json_data.fullrow, '$.sessionID\[0]') as user_session  order by get_json_object(json_data.fullrow, '$.stringDate\[0]') as date desc) as rank
FROM json_data

我收到以下错误;

FAILED: ParseException line 5:80 missing ) at 'as' near 'user_session' in table name
line 5:96 missing FROM at 'order' near 'user_session' in table name

任何帮助表示感谢。

1 个答案:

答案 0 :(得分:0)

我无法测试您的具体情况,但我建议您重新调整查询,以便get_json_object调用位于sub-query,然后将其用于ranking }。

select tmp.player_event, tmp.user_session, tmp.date, rank() over (partition by tmp.user_session order by tmp.date) as rnk
from
(
SELECT 
get_json_object(json_data.fullrow, '$.event.type\[0]') as player_event, 
get_json_object(json_data.fullrow, '$.sessionID\[0]') as user_session,
get_json_object(json_data.fullrow, '$.stringDate\[0]') as date
FROM json_data) tmp
;