我正在尝试加载下面的表,该表在hive中有两个数组类型的列。
基准表:
Array<int> col1 Array<string> col2
[1,2] ['a','b','c']
[3,4] ['d','e','f']
我在hive中创建了表格,如下所示:
create table base(col1 array<int>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
然后加载数据如下:
load data local inpath '/home/hduser/Desktop/batch/hiveip/basetable' into table base;
我使用了以下命令:
select * from base;
我得到了如下输出
[null,null] ["['a'","'b'","'c']"]
[null,null] ["['d'","'e'","'f]"]
我没有以正确的格式获取数据。
请帮我解决我的错误。
答案 0 :(得分:0)
您可以在int数组中更改字符串的 col1数组的数据类型,然后就可以获取col1的数据。
使用col1数据类型作为数组(字符串): -
hive>create table base(col1 array<string>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
hive>select * from base;
+--------------+------------------------+--+
| col1 | col2 |
+--------------+------------------------+--+
| ["[1","2]"] | ["['a'","'b'","'c']"] |
| ["[3","4]"] | ["['d'","'e'","'f']"] |
+--------------+------------------------+--+
为什么会出现这种情况,因为hive无法将数组内的值检测为整数,因为我们在 [] 中包含了1,2个值 访问col1元素: -
hive>select col1[0],col1[1] from base;
+------+------+--+
| _c0 | _c1 |
+------+------+--+
| [1 | 2] |
| [3 | 4] |
+------+------+--+
<强>(或)强>
使用col1数据类型作为Array(int type): -
如果您打算不想更改数据类型,则需要将输入文件保留为 数组下方的没有[]方括号( iecol1)值。
1,2 ['a','b','c']
3,4 ['d','e','f']
然后创建与问题中提到的相同的表,然后hive可以将前1,2作为数组元素检测为int类型。
hive> create table base(col1 array<int>,col2 array<string>) row format delimited fields terminated by '\t' collection items terminated by ',';
hive> select * from base;
+--------+------------------------+--+
| col1 | col2 |
+--------+------------------------+--+
| [1,2] | ["['a'","'b'","'c']"] |
| [3,4] | ["['d'","'e'","'f']"] |
+--------+------------------------+--+
访问数组元素: -
hive> select col1[0] from base;
+------+--+
| _c0 |
+------+--+
| 1 |
| 3 |
+------+--+