Question

我用查询创建了一个配置单元表 -

create table studpart4(id int, name string) partitioned by (course string, year int) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

成功创建。

使用以下命令加载数据 -

load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2);

我的输入数据文件看起来像 -

 101    student1    cse 1

 102    student2    cse 2

 103    student3    eee 3

 104    student4    eee 4

 105    student5    cse 1

 106    student6    cse 2

 107    student7    eee 3

 108    student8    eee 4

 109    student9    cse 1

 110    student10   cse 2

但输出显示为（select * from studpart4） -

 101    student1    cse 2

 102    student2    cse 2

 103    student3    eee 2

 104    student4    eee 2

 105    student5    cse 2

 106    student6    cse 2

 107    student7    eee 2

 108    student8    eee 2

 109    student9    cse 2

 110    student10   cse 2

为什么最后一列是全部2.为什么它会被更改并错误地更新。

Answer 1

您显示的结果正是您告诉Hive对您的数据执行的操作。

在第一个命令中，您正在创建一个分区表studpart4，其中包含两列id和name，以及两个分区键course和{{1} （曾经创建过，表现得像常规列一样）。现在，在你的第二个命令中，你正在做的是：

year

这基本上意味着“将<{1}}中的所有数据复制到表load data local inpath '/scratch/hive_inputs/student_input_1.txt' overwrite into table studpart4 partition(course='cse',year=2)中，并将列student_input_1.txt的所有值设置为”cse“和所有列studpart4到'2'的值。在内部，Hive将创建一个包含分区键的目录结构。您的数据将存储在如下目录中：

course

我怀疑你真正想要的是Hive在你的year文件中检测.../studpart4/course=cse/year=2/和course的列值，并为你设置正确的值。为了执行该操作，您必须使用表的dynamic partitioning并将year数据的策略遵循到外部表中，然后使用.txt命令将数据存储到外部表中你的loading表。 BigDataLearner在评论中发布的链接描述了这一策略。

我希望这会有所帮助。

蜂巢 - 分区表

1 个答案: