Hive Join Query抛出错误

时间:2017-03-30 13:41:59

标签: sql hive

如果user_id有多条记录,则按照event_timestamp保留最新记录。

我的HIVE查询:

SELECT 
 a.user_id,
 unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM') as  converted_event_timestamp,
 a.user_name,
 a.user_location
 FROM 
 sports_views a
 INNER JOIN
 (SELECT user_id,MAX(unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM')) as max_event_timestamp FROM sports_views GROUP BY user_id )b
 ON( a.user_id =b.user_id AND a.converted_event_timestamp =b.max_event_timestamp)
 LIMIT 10;

当我尝试运行此配置单元查询时,我收到以下错误

SemanticException [Error 10002]: Line 8:43 Invalid column reference 'converted_event_timestamp'

有人可以告诉我这个配置单元查询出了什么问题,我该如何解决?

4 个答案:

答案 0 :(得分:1)

我看到你在查询中命名a.converted_event_timestamp的位置。您不能在联接中使用它,因为这可能在select中的转换之前进行评估。加入这个

unix_timestamp(a.event_timestamp,'dd/MM/YYYY HH:MM')

答案 1 :(得分:1)

中选择a.user_id,a.user_name,a.user_location

(SELECT  a.user_id,  unix_timestamp(event_timestamp,'dd / MM / YYYY HH:MM')as converted_event_timestamp,  a.user_name,  a.user_location  从  sports_views)a  内部联接  (SELECT user_id,MAX(unix_timestamp(event_timestamp,'dd / MM / YYYY HH:MM'))as max_event_timestamp FROM sports_views GROUP BY user_id)b  ON(a.user_id = b.user_id AND a.converted_event_timestamp = b.max_event_timestamp)  限制10;

答案 2 :(得分:0)

它不支持别名加入。改为在连接条件中使用unix_timestamp(event_timestamp,'dd/MM/YYYY HH:MM')或在子查询中计算它。

ON( a.user_id =b.user_id AND unix_timestamp(a.event_timestamp,'dd/MM/YYYY HH:MM')=b.max_event_timestamp)

答案 3 :(得分:0)

select a.userid, a.unix_timestamp(event_timestamp,'yyyy/MM/dd hh:MM') as min 
    from sports_views a 
    inner join (select userid, first_value(1) over(order by time desc) as max 
        from(select userid,unix_timestamp(event_timestamp,'yyyy/MM/dd hh:MM') as time 
        from sports_views)sv )e) 
where a.min=max