的test.txt

Question

请不要介意这是非常基本的：

的test.txt

1 ravi 100 hyd
2 krishna 200 hyd
3 fff 300秒

我在hive中创建了一个带有分区的表，并按如下方式加载数据：

create external table temp(id int, name string, sal int) 
partitioned by(city string) 
location '/testing';

load data inpath '/test.txt' into table temp partition(city='hyd');

在HDFS中，结构是/testing/temp/city=hyd/test.txt

当我将表格查询为“select * from temp”;

输出：

temp.id temp.name temp.sal temp.city  
    1   ravi    100 hyd  
    2   krishna 200 hyd  
    3   fff     300 hyd

这里我的问题是为什么第三行中“sec”的城市名称在输出中变为“hyd”？

我身边有什么问题吗？

提前致谢!!!

Answer 1

你的问题是：

load data inpath '/test.txt' into table temp partition(city='hyd');

您加载到此分区的所有数据都使用city ='hyd'。如果您正在进行静态分区，则您有责任将正确的值放入分区。

只需从txt文件中删除最后一行，将其放入test2.txt并执行：

load data inpath '/test.txt' into table temp partition(city='hyd');
load data inpath '/test2.txt' into table temp partition(city='sec');

是的，不太舒服，但静态分区以这种方式工作。

Answer 2

我希望分区对于单个文件的load语句不能正常工作相反，我们需要在配置单元中写入临时表（stat_parti），然后从那里我们需要另一个分区表（stat_test）

前：

create external table stat_test(id int, name string, sal int)
partitioned by(city string) 
row format delimited fields 
terminated by ' ' 
location '/user/test/stat_test';

并且可以提供静态或动态分区。

1）静态分区

insert into table stat_test partition(city='hyd') select id,name,sal from stat_parti where city='hyd';  
insert into table stat_test partition(city='sec') select id,name,sal from stat_parti where city='sec';

2）动态分区

我们需要启用

set hive.exec.dynamic.partition=true  
set hive.exec.dynamic.partition.mode=nonstrict

insert overwrite table stat_test partition(city) select id,name,sal from stat_parti;

Answer 3

您已在HDFS路径中复制的数据文件test.txt-'/ testing / temp / city = hyd / test.txt' 所有数据将进入分区“ city = hyd”

，Hive使用目录名称来检索值。因此，字段城市名称来自hyd的目录名称。

使用load命令将数据加载到hive静态分区表

的test.txt

输出：

3 个答案:

1）静态分区

2）动态分区