Hive:在查询中将数组<string>转换为array <int>

时间:2015-09-30 16:38:13

标签: arrays hadoop hive

我有两张桌子:

create table a (
`1` array<string>);

create table b (
`1` array<int>);

我希望将表a放在表b中(表b为空):

insert into table b
select * from a;

这样做时我收到以下错误:

FAILED: SemanticException [Error 10044]: Line 1:18 Cannot insert into
target table because column number/types are different 'b': Cannot
convert column 0 from array<string> to array<int>.

如果字段只有stringint类型,我就不会收到此错误。

有没有办法用数组进行强制转换?

3 个答案:

答案 0 :(得分:2)

使用explode()collect_list()重组数组。

初始字符串数组示例:

hive> select array('1','2','3') string_array;
OK
string_array
["1","2","3"]
Time taken: 1.109 seconds, Fetched: 1 row(s)

转换数组:

hive> select collect_list(cast(array_element as int)) int_array --cast and collect array
       from( select explode(string_array) array_element         --explode array
               from (select array('1','2','3') string_array     --initial array
                    )s 
           )s;

结果:

OK
int_array
[1,2,3]
Time taken: 44.668 seconds, Fetched: 1 row(s)

如果您想在insert + select查询中添加更多列,请使用 lateral view [outer]

select col1, col2, collect_list(cast(array_element as int)) int_array
 from
(
select col1, col2 , array_element         
  from table
       lateral view outer explode(string_array) a as array_element         
)s
group by col1, col2
;

答案 1 :(得分:1)

Brickhouse jar的执行速度比将其强制转换并将其收集回列表的速度快得多。将此罐子添加到hdfs位置。
Use the link below to download the brick house jar

add jar hdfs://hadoop-/pathtojar/brickhouse-0.7.1.jar;   
create temporary function cast_array as 'brickhouse.udf.collect.CastArrayUDF';   
select CAST(columns, 'int') AS columname from table;  
select CAST(columns, 'string') AS columname from table

答案 2 :(得分:0)

  

有没有办法对数组进行转换?

不容易。如果知道数组的大小,则可以手动对其进行转换,但是如果不知道其大小,则可能需要使用结构。查看我对this similar question.的回答


另外:我不能拒绝其他答案,但是对于具有多个数组的嵌套选择而言,它将失败。

它不是强制转换数组元素并重建原始数组,而是强制转换然后将所有元素组合到单个数组中。示例:

hive> select id, my_array from array_table limit 3;
OK
10023307    ["0.20296966","0.17753501","-0.03543373"]
100308007   ["0.16155224","0.1945944","0.09167781"]
100384207   ["0.025892768","0.023214806","-0.003712816"]

hive> select
    >     collect_list(cast(array_element as double)) int_array
    > from (
    >     select
    >         explode(my_array) array_element
    >     from (
    >         select
    >             my_array
    >         from array_table limit 3
    >     ) X
    > ) s;
OK
[0.20296966,0.17753501,-0.03543373,0.16155224,0.1945944,0.09167781,0.025892768,0.023214806,-0.003712816]