我有一个大型数据文件,如下所示:
1 6
1 6
2 7
3 2
3 6
1 7
1 9
2 9
1 5
3 9
3 1
2 8
我希望按第一列对数据进行分组,找到每个第一列值的第二列平均值,然后按第二列平均值对这些分组进行排序。所以输出应该是:
2 8
1 6.6
3 4.5
我的代码现在看起来像这样,并且不起作用:
CREATE EXTERNAL TABLE as (a STRING, b INT)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://myfolder/hive';
CREATE EXTERNAL TABLE output(a STRING, avgb DOUBLE)
ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t'
LOCATION 's3n://myfolder/hive';
load data inpath "s3n://myfolder/file.txt" into TABLE as;
insert overwrite output select a, avg(b) from as group by a order by avg(b) DESC limit 1000;
我应该注意以下内容可以正常工作,但有些东西不适用于订单,并在SQL中插入适合我的步骤:
select a, avg(b) from as group by a;
当我尝试:
select a, avg(b) from as group by a order by avg(b);
我得到“FAILED:语义分析错误:行1:66无效的表别名或列引用'b':(可能的列名是:_col0,_col1)。
答案 0 :(得分:4)
只需在子查询中将其移出:
select a
from (select a, avg(b) as avgb from as group by a) as t
order by avgb;