Hive表数组列 - 使用array_index

时间:2016-05-22 04:01:34

标签: arrays hadoop hive getjson hiveql

您好我有一个Hive表

select a,b,c,d from riskfactor_table 
In the above table B, C and D columns are array columns. Below is my Hive DDL 
Create external table riskfactor_table 
(a string, 
b array<string>, 
c array<double>, 
d array<double> ) 
ROW FORMAT DELIMITED FIELDS TERMINATED BY '~'  
stored as textfile location 'user/riskfactor/data'; 

这是我的表数据:

  

ID400S,[&#34; JMS&#34;&#34; JNDI&#34;&#34; JAXB&#34;&#34; jaxn&#34;],[100200300400],[1, 2,3,4]
  ID200N,[&#34;一个&#34;&#34; 2&#34;&#34;三&#34;],[212352418],[6,10,8]

如果我想分割数组列,我该如何拆分? 如果我使用爆炸功能,我可以只分割一列的数组值

  

选择exploive(b)作为来自riskfactor_table的b;

输出

jms  
jndi  
jaxb  
jxn  
one  
two  
three

但我希望使用下面的一个选择语句填充所有列 -

  

查询 - 从risk_factor中选择a,b,c,d;

输出

row1-  ID400S    jms    100    1  
row2-  ID400S    jndi   200    2  
row3-  ID400S    jaxb    300    3  
row4-  ID400S    jaxn    400    4  

如何填充所有数据?

3 个答案:

答案 0 :(得分:1)

您可以使用LATERAL VIEW实现此目的

       SELECT Mycoulmna, Mycoulmnb ,Mycoulmnc
                 FROM  riskfactor_table
             LATERAL VIEW explode(a) myTablea AS Mycoulmna
             LATERAL VIEW explode(a) myTableb AS Mycoulmnb
             LATERAL VIEW explode(a) myTablec AS Mycoulmnc ;

更多detail扔掉它。

答案 1 :(得分:1)

我也在寻找同样问题的解决方案。感谢杰罗姆,为此Brickhouse解决方案。

我必须做一个小改动(添加别名“n1 as n”),如下所示,使其适用于我的情况:

hive> describe test;
OK
id              string
animals     array<string>
cnt         array<bigint>

hive> select * from test;
OK
abc     ["cat","dog","elephant","dolphin","snake","parrot","ant","frog","kuala","cricket"]      [10597,2027,1891,1868,1804,1511,1496,1432,1305,1299]

hive> select `id`, array_index(`animals`,n), array_index(`cnt`,n) from test lateral view numeric_range(0,10) n1 as n;
OK
abc     cat             10597
abc     dog             2027
abc     elephant        1891
abc     dolphin         1868
abc     snake           1804
abc     parrot          1511
abc     ant             1496
abc     frog            1432
abc     kuala           1305
abc     cricket         1299

唯一的事情是我必须事先知道有10个要爆炸的元素。

答案 2 :(得分:-1)

使用Brickhouse的'numeric_range'UDF。这是一篇描述细节的博客文章。

https://brickhouseconfessions.wordpress.com/2013/03/07/exploding-multiple-arrays-at-the-same-time-with-numeric_range/

在您的情况下,您的查询将类似于

SELECT a, 
       array_index( b, i ),
       array_index( c, i ),
       array_index( d, i )
FROM risk_factor_table
 LATERAL VIEW numeric_range( 0, 3 );