以下是我在名为 temp_stat 的 Hive 表中推送的数据集:
COUNTRY CITY TEMP
---------- -------------------- -----
US Arizona 51.7
US California 56.7
US Bullhead City 51.1
India Jaisalmer 42.4
Libya Aziziya 57.8
Iran Lut Desert 70.7
India Banda 42.4
当我尝试通过选择命令查看数据时,我得到以下数据集:
US,Arizona,51.7 NULL NULL
US,California,56.7 NULL NULL
US,Bullhead City,51.1 NULL NULL
India,Jaisalmer,42.4 NULL NULL
Libya,Aziziya,57.8 NULL NULL
Iran,Lut Desert,70.7 NULL NULL
India,Banda,42.4 NULL NULL
接下来,我想将这些记录分组放在国家/地区,并获取每个国家/地区的最高温度以及城市名称,因此我运行了以下查询:
select country,city,temp
from (
select country,city,temp,
row_number() over (partition by country order by temp desc) as part
from temp_stat
) a
where part = 1
order by country, city;
在 hive shell中运行上述查询后,我得到以下结果:
US,Arizona,51.7 NULL NULL
US,California,56.7 NULL NULL
US,Bullhead City,51.1 NULL NULL
India,Jaisalmer,42.4 NULL NULL
Libya,Aziziya,57.8 NULL NULL
Iran,Lut Desert,70.7 NULL NULL
India,Banda,42.4 NULL NULL
即使我运行内部查询以生成 row_number ,我也会为所有记录获得类似的 行号 。 (像这样:)
India,Banda,42.4 NULL NULL 1
India,Jaisalmer,42.4 NULL NULL 1
Iran,Lut Desert,70.7 NULL NULL 1
Libya,Aziziya,57.8 NULL NULL 1
US,Arizona,51.7 NULL NULL 1
US,Bullhead City,51.1 NULL NULL 1
US,California,56.7 NULL NULL 1
enter code here
我还尝试过 dense_rank()和 rank()。没有新的结果。表定义有什么问题或什么?
所有帮助将不胜感激!
答案 0 :(得分:1)
字段以','
结尾你的表定义应该是这样的:
create external table temp_stat
(
country string
,city string
,temp decimal(11,1)
)
row format delimited
fields terminated by ','
;
select * from temp_stat;
+---------+---------------+------+
| country | city | temp |
+---------+---------------+------+
| US | Arizona | 51.7 |
| US | California | 56.7 |
| US | Bullhead City | 51.1 |
| India | Jaisalmer | 42.4 |
| Libya | Aziziya | 57.8 |
| Iran | Lut Desert | 70.7 |
| India | Banda | 42.4 |
+---------+---------------+------+