在Impala SQL中使用GROUP BY进行ORDER BY

时间:2014-07-23 02:40:35

标签: sql cloudera impala cloudera-cdh

作为一个研究项目,我决定通过建立一个完整的CDH5环境来使用Cloudera Impala。然后我决定使用查询数据。由于某种原因,简单的ORDER BY不适用于在Impala SQL中使用的GROUP BY语句。 Impala是否支持此功能?

以下是我的查询无需排序的方式:

SELECT TO_DATE(time) AS dt
FROM wearable_data
GROUP BY dt 

结果:

0   2014-01-01
1   2014-07-15
2   2014-07-20
3   2014-07-17

现在以下查询不起作用:

SELECT TO_DATE(time) AS dt
FROM wearable_data
GROUP BY dt
ORDER BY dt 
-- ORDER BY 1

结果:

Query 6e4da94e0c586e34:7077273d6337e893 100% Complete (23 out of 23)

说明如下:

Estimated Per-Host Requirements: Memory=256.00MB VCores=2
WARNING: The following tables are missing relevant table and/or column statistics.
default.wearable_data

04:EXCHANGE [PARTITION=UNPARTITIONED]
|
03:AGGREGATE [MERGE FINALIZE]
|  group by: to_date(time)
|
02:EXCHANGE [PARTITION=HASH(to_date(time))]
|
01:AGGREGATE
|  group by: to_date(time)
|
00:SCAN HDFS [default.wearable_data]
   partitions=1/1 size=1.44KB

对此有何想法?

1 个答案:

答案 0 :(得分:3)

我认为这是你的问题:

“在Impala 1.4.0之前,Impala要求使用ORDER BY子句的查询还包含LIMIT子句”参考here