HIVE QUERY SELECT * from bookfreq其中freq IN(SELECT Max(freq)FROM bookfreq);

时间:2015-02-07 13:09:11

标签: hadoop hive hiveql

我正在编写hive查询,因​​为获取记录具有最大的频率值。

table name bookfreq, having two column year & freq

year freq

1999  2

2000  4

1989  4

1990  5

查询:

SELECT * FROM bookfreq where freq IN (SELECT Max(freq) FROM bookfreq);

我得到一个像

这样的例外
FAILED: ParseException line 1:38 cannot recognize input near 'SELECT' 'Max' '(' in expression specification

2 个答案:

答案 0 :(得分:1)

如果您有Hive 0.13或更高版本(如文档here所述),则应该可以使用这种类型的子查询。但是,列名仍必须是完全限定的。所以,要做我认为你想在Hive 0.13或更高版本中做的事情,那就是

SELECT * FROM bookfreq a
WHERE a.freq IN (SELECT max(b.freq) FROM bookfreq b);

如果你有旧版本的Hive,你可以尝试这种表示法:

SELECT a.* 
FROM bookfreq a JOIN (SELECT max(freq) as max_freq FROM bookfreq) b
  ON a.freq = b.max_freq;

如果仍然不起作用(这可能意味着您的Hive版本已经过时),您可能必须首先实际创建包含max(freq)的表作为具体对象:

CREATE TABLE b AS SELECT max(freq) AS max_freq FROM bookfreq;

然后使用普通b运行上述查询。类似的东西:

SELECT bookfreq.*
FROM bookfreq JOIN b ON bookfreq.freq = b.max_freq;

答案 1 :(得分:1)

您需要为配置单元中的任何子查询添加别名。

尝试在子查询中添加别名,例如:

SELECT * FROM bookfreq where freq IN (SELECT Max(freq) FROM bookfreq) a;

这里只是一个偏好,但我更喜欢这样写:

select * from (select max(freq) as max_freq from bookfreq) a join bookfreq b on a.max_freq = b.freq;