我是Hive / SQL的新手,我遇到了一个相当简单的问题。我的数据如下:
+------------+--------------------+-----------------------+
| carrier_iD | meandelay | meancanceled |
+------------+--------------------+-----------------------+
| EV | 13.795802119653473 | 0.028584251044292006 |
| VX | 0.450591016548463 | 2.364066193853424E-4 |
| F9 | 10.898001378359766 | 0.00206753962784287 |
| AS | 0.5071547420965062 | 0.0057404326123128135 |
| HA | 1.2031093279839498 | 5.015045135406214E-4 |
| 9E | 8.147899230704216 | 0.03876067292247866 |
| B6 | 9.45383857757506 | 0.003162096314343487 |
| UA | 8.101511665305816 | 0.005467725574605967 |
| FL | 0.7265068895709532 | 0.0041141513746490044 |
| WN | 7.156119279121648 | 0.0057419058192869415 |
| DL | 4.206288692245839 | 0.005123990066804269 |
| YV | 6.316802855264404 | 0.029304029304029346 |
| US | 3.2221527095063736 | 0.007984031936127766 |
| OO | 6.954715814690328 | 0.02596499362466706 |
| MQ | 9.74568222216328 | 0.025628100708354324 |
| AA | 8.720522654298968 | 0.019242775597574157 |
+------------+--------------------+-----------------------+
我希望Hive返回具有meanDelay max值的行。我有:
SELECT CAST(MAX(meandelay) as FLOAT) FROM flightinfo;
确实返回了max(我使用强制转换,因为我的值保存为STRING)。那么:
SELECT * FROM flightinfo WHERE meandelay = (SELECT CAST(MAX(meandelay) AS FLOAT) FROM flightinfo);
我收到以下错误:
FAILED: ParseException line 1:44 cannot recognize input near 'select' 'cast' '(' in expression specification
答案 0 :(得分:9)
使用windowing and analytics functions
SELECT carrier_id, meandelay, meancanceled
FROM
(SELECT carrier_id, meandelay, meancanceled,
rank() over (order by cast(meandelay as float) desc) as r
FROM table) S
WHERE S.r = 1;
如果多行具有相同的最大值,这也将解决问题,您将获得所有行作为结果。如果您只想将rank()
更改为row_number()
,或将另一个字词添加到order by
。
答案 1 :(得分:2)
使用连接代替。
SELECT a.* FROM flightinfo a left semi join
(SELECT CAST(MAX(meandelay) AS FLOAT)
maxdelay FROM flightinfo)b on (a.meandelay=b.maxdelay)
答案 2 :(得分:1)
您可以使用Brickhouse中的collect_max
UDF(http://github.com/klout/brickhouse)来解决此问题,传入值1,这意味着您只需要单个最大值。
select array_index( map_keys( collect_max( carrier_id, meandelay, 1) ), 0 ) from flightinfo;
另外,我在某处读过Hive max
UDF确实允许您访问该行上的其他字段,但我认为使用collect_max
更容易。
答案 3 :(得分:0)
我不认为你的子查询是允许的......
快速浏览一下:
https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries
规定:
从Hive 0.13开始,WHERE支持某些类型的子查询 条款。这些是可以处理查询结果的查询 作为IN和NOT IN语句的常量(称为不相关的 子查询,因为子查询不引用来自的列 父查询):