Apache hive MSCK REPAIR TABLE未添加新分区

时间:2015-08-03 07:46:44

标签: hadoop mapreduce hive apache-hive

我是Apache Hive的新手。在处理外部表分区时,如果我将新分区直接添加到HDFS,则在运行MSCK REPAIR表后不会添加新分区。以下是我尝试过的代码,

- 创建外部表

hive> create external table factory(name string, empid int, age int) partitioned by(region string)  
    > row format delimited fields terminated by ','; 

- 详细的表格信息

Location:  hdfs://localhost.localdomain:8020/user/hive/warehouse/factory     
Table Type:             EXTERNAL_TABLE           
Table Parameters:        
    EXTERNAL                TRUE                
    transient_lastDdlTime   1438579844  

- 在HDFS中创建目录以加载表工厂的数据

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

- 表格数据

cat factory1.txt
emp1,500,40
emp2,501,45
emp3,502,50

cat factory2.txt
EMP10,200,25
EMP11,201,27
EMP12,202,30

- 从本地复制到HDFS

[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory1.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory1'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory2.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'

- 更改要在Metastore中更新的表

hive> alter table factory add partition(region='southregion') location '/user/hive/testing/testing1/factory2';
hive> alter table factory add partition(region='northregion') location '/user/hive/testing/testing1/factory1';            
hive> select * from factory;                                                                      
OK
emp1    500 40  northregion
emp2    501 45  northregion
emp3    502 50  northregion
EMP10   200 25  southregion
EMP11   201 27  southregion
EMP12   202 30  southregion

现在我创建了新文件factory3.txt,以添加为表工厂的新分区

cat factory3.txt
user1,100,25
user2,101,27
user3,102,30

- 创建路径并复制表格数据

[cloudera@localhost ~]$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory2'
[cloudera@localhost ~]$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/testing/testing1/factory3'

现在我执行了以下查询来更新添加的新分区的Metastore

MSCK REPAIR TABLE factory;

现在该表未提供factory3文件的新分区内容。在为工厂工厂添加分区时,我可以知道我在哪里做错了吗?

然而,如果我运行alter命令,那么它将显示新的分区数据。

hive> alter table factory add partition(region='eastregion') location '/user/hive/testing/testing1/factory3';

我可以知道为什么MSCK REPAIR TABLE命令不起作用吗?

2 个答案:

答案 0 :(得分:10)

要使MSCK生效,应使用命名约定/partition_name=partition_value/

答案 1 :(得分:0)

您必须将数据放在表位置目录中名为“region = eastregio”的目录中:

$ hadoop fs -mkdir 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'
$ hadoop fs -copyFromLocal '/home/cloudera/factory3.txt' 'hdfs://localhost.localdomain:8020/user/hive/warehouse/factory/region=eastregio'