如何在Hive中进行子选择?我想我可能犯了一个非常明显的错误,这对我来说并不那么明显......
我收到错误:FAILED: Parse Error: line 4:8 cannot recognize input 'SELECT' in expression specification
以下是我的三个源表:
aaa_hit -> [SESSION_KEY, HIT_KEY, URL]
aaa_event-> [SESSION_KEY,HIT_KEY,EVENT_ID]
aaa_session->[SESSION_KEY,REMOTE_ADDRESS]
...我想要做的是将结果插入到结果表中,如下所示:
result -> [url, num_url, event_id, num_event_id, remote_address, num_remote_address]
...其中第1列是URL,第3列是每个URL的前1个“事件”,第5列是访问该URL的前1个REMOTE_ADDRESS。 (甚至列是前一列的“计数”。)
Soooooo ......我在这里做错了什么?
INSERT OVERWRITE TABLE result2
SELECT url,
COUNT(url) AS access_url,
(SELECT events.event_id as evt,
COUNT(events.event_id) as access_evt
FROM aaa_event events
LEFT OUTER JOIN aaa_hit hits
ON ( events.hit_key = hit_key )
ORDER BY access_evt DESC LIMIT 1),
(SELECT sessions.remote_address as remote_address,
COUNT(sessions.remote_address) as access_addr
FROM aaa_session sessions
RIGHT OUTER JOIN aaa_hit hits
ON ( sessions.session_key = session_key )
ORDER BY access_addr DESC LIMIT 1)
FROM aaa_hit
ORDER BY access_url DESC;
非常感谢你:)
答案 0 :(得分:10)
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
Hive仅支持子查询 FROM子句。
您不能将子查询用作Hive中的“列”。
要解决此问题,您需要在FROM子句和JOIN
中使用该子查询。 (以下不起作用,但是这个想法)
SELECT url,
COUNT(url) AS access_url,
t2.col1, t2.col2 ...
FROM aaa_hit
JOIN (SELECT events.event_id as evt,
COUNT(events.event_id) as access_evt
FROM aaa_event events
LEFT OUTER JOIN aaa_hit hits
ON ( events.hit_key = hit_key )
ORDER BY access_evt DESC LIMIT 1),
(SELECT sessions.remote_address as remote_address,
COUNT(sessions.remote_address) as access_addr
FROM aaa_session sessions
RIGHT OUTER JOIN aaa_hit hits
ON ( sessions.session_key = session_key )
ORDER BY access_addr DESC LIMIT 1) t2
ON (aaa_hit.THING = t2.THING)
查看https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins以获取有关在Hive中使用JOIN的更多信息。
答案 1 :(得分:0)
您没有GroupBy操作,Count是一个聚合。只有count(*)才能在没有GroupBy子句的情况下工作。
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy