Question

我有一个文件。它包含4个字段，其中最后两个字段是数组。所以我在Hive中创建了表：

create table testtable(f1 string, f2 string, f3 array<string>) row format delimited fields terminated by ',' collection items terminated by ',';

数据：

a,b,c,d
1,sdf,2323,sdaf
1,sdf,34,wer
1,sdf,223,daf
1,sdf,233,af

当我使用下面的查询将数据加载到表中时，它会成功加载数据，但结果不正确。它没有加载数组中的最后两个字段并只加载一个字段。结果如下：

load data inpath 'data/file.txt' into table testtable;

结果：

hive> select * from testtable;                                                                                                            
OK
a       b       ["c"]
1       sdf     ["2323"]
1       sdf     ["34"]
1       sdf     ["223"]
1       sdf     ["233"]

所以问题是如何在具有相同集合定界符的数组字段中加载数据？我的源文件将始终包含相同的分隔符。

Answer 1

Hive将所有分隔符解释为字段分隔符，因此将输入视为3 4列。由于您已将表定义为具有3列，因此它只会忽略第4列。我认为您需要将数据读入临时的4列表，然后从中构建所需的表：

create table temptesttable(f1 string, f2 string, f3 string, f4 string) 
row format delimited fields terminated by ',';

load data inpath 'data/file.txt' into table temptesttable;

create table testtable as select f1, f2, array(f3, f4) as f3 from temptesttable;

如何使用与Hive中的集合字段分隔符相同的分隔符传递数组？

1 个答案: