我正在使用Hive作为首选接口的Hadoop数据库。我希望能够将几个SELECT语句组合成一个查询(类似于UNION,但每个查询填充不同的列)。下面的查询将返回我在单个列中需要的所有结果,但我希望能够使用每个查询来填充单个列。任何关于如何实现这一点的帮助都会很棒 - 某种类似于VALUES的Hive可能会这样做。欢呼声。
INSERT OVERWRITE TABLE tstr_tmp SELECT * FROM
(SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp asc limit 1) as last_visit_of_day
UNION ALL
SELECT * FROM (SELECT CAST(COUNT(hr) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_hour_bins
UNION ALL
SELECT * FROM (SELECT time_stamp FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' ORDER BY time_stamp desc limit 1) as first_visit_of_day
UNION ALL
SELECT * FROM (SELECT ext_url FROM http WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name
UNION ALL
SELECT * FROM (SELECT CAST(count(*) as string) FROM http WHERE ext_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'http://lucy.info' group by ext_url) as n_http_requests
UNION ALL
SELECT * FROM (SELECT int_ip FROM http WHERE ext_hostname = 'exotichorse' group by int_ip) as internal_ip;
根据下面的要求,每个查询都会返回一个字符串值。对于此特定查询集,将返回以下结果;
00:08:00
2
07:00:00
http://lucy.info
2
192.168.0.22
我正在开发一个能告诉我用户流量的数据库,所以这个子集会填充下表;
CREATE TABLE metric_http_domain_time_summary( last_visit_of_day string, n_hour_bins string, first_visit_of_day string, domain_name string, n_http_requests string, internal_ip string) PARTITIONED BY (dt string, hr string, origin string, cl string, st string);
我知道我需要对进入的数据进行分区,但我对该部分非常有信心,并且一旦我设法运行未分区的查询就会对其进行编辑。我能力的差距是将子查询串在一起以填充表格。
答案 0 :(得分:0)
离开并长时间思考这个问题后,我找到了答案。 UNION是不必要的,实际上是阻碍了。此查询将根据需要返回上述输出。如果其他人遇到同样的问题,请留下这个。由于堆栈溢出信誉限制,我不得不删除ext_url,但这个概念会起作用。
SELECT * FROM
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp asc limit 1) as last_visit_of_day,
(SELECT CAST(COUNT(hr) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by ext_url) as n_hour_bins,
(SELECT time_stamp FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' ORDER BY time_stamp desc limit 1) as first_visit_of_day,
(SELECT ext_url FROM http WHERE ext_url = 'http://lucy.info' group by ext_url) as domain_name,
(SELECT CAST(count(*) as string) FROM http WHERE referrer_hostname = 'exotichorse' AND dt = '01/07/2015' AND ext_url = 'ext_url_here' group by ext_url) as n_http_requests,
(SELECT int_ip FROM http WHERE referrer_hostname = 'exotichorse' group by int_ip) as internal_ip;