假设我有两个本地文件file1.txt和file2.txt。
file1.txt的内容:
1,a
3,c
file2.txt的内容
2,b
4,d
我把这些文件放在Hadoop上就像这样
hadoop fs -rm -r /user/cloudera/repart2/*
hadoop fs -mkdir -p /user/cloudera/repart2/20150401
hadoop fs -put file1.txt /user/cloudera/repart2/20150401/
hadoop fs -mkdir -p /user/cloudera/repart2/20150402
hadoop fs -put file2.txt /user/cloudera/repart2/20150402/
我制作了一个Hive表
# Select a test database
use training;
# Create the table
create external table repart (
col1 int, col2 string)
PARTITIONED BY (Test int)
ROW FORMAT DELIMITED
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
STORED AS TEXTFILE
location '/user/cloudera/repart2';
# Add partititons
ALTER TABLE repart ADD PARTITION (Test='20150401') LOCATION '/user/cloudera/repart2/20150401/';
ALTER TABLE repart ADD PARTITION (Test='20150402') LOCATION '/user/cloudera/repart2/20150402/';
当我做一个选择陈述
select * from repart;
显示
1 a 20150401
3 c 20150401
2 b 20150402
4 d 20150402
我希望我的表最终看起来像这样
1 a 20150401
2 b 20150401
3 c 20150401
4 d 20150401
2 b 20150402
4 d 20150402
但是当我尝试插入查询时
INSERT INTO TABLE repart PARTITION (Test='20150401') select col1, col2 FROM repart where Test = 20150402;
查询使表格看起来像这样。分区20150401中的原始数据已被覆盖。
2 b 20150401
4 d 20150401
2 b 20150402
4 d 20150402
返回“hive --version”命令:0.12.0-cdh5.0.0。我注意到this jira,但我的桌子已全部小写,所以我不确定是什么问题。
答案 0 :(得分:0)
当我使用Hive 1.1.0-cdh5.4.0时,相同的代码运行没有问题。它一定是破了0.12左右。我将使用新版本。如果有人知道为什么0.12.0会破坏,我仍然会感兴趣。