hive数组posexplode问题

时间:2017-06-17 21:12:39

标签: hadoop hive

嗨我有一个使用hive数组爆炸的场景

我有一个包含多个数组(4个数组)的hive表

desc manager;
OK
supervisorid            int                                         
reportee_name           array<string>                               
reportee_age            array<string>                               
reportee_address        array<string>                               
reportee_occupation     array<string>                               
reportee_salary         array<string>

以上4个数组是相关的(就像每个数组的第n个元素映射到其他数组的第n个元素。

但是在某些情况下,当其他相关数组的n元素具有值时,我可能没有为特定元素数组的n元素获取值。 实施例

select * from manager;
OK
101 ["richie","tom","jack"] ["36","28","29"]    ["abc","xyz"]   ["SE","JSE","SE"]   ["5000","3000","6000"]

当所有数组列reportee_name,reportee_age,reportee_occupation,reportee_salary有3个elemnts而reportee_address只有2个elemnts

因此,当我做posexplode横向视图时,我将只获得2个元素

select supervisorid,reportee_name_val,reportee_age_val,reportee_address_val,reportee_occupation_val,reportee_salary_val from manager lateral view posexplode(reportee_name)reportee_name  as reportee_name_pos,reportee_name_val lateral view posexplode(reportee_age)reportee_age as reportee_age_pos,reportee_age_val lateral view posexplode (reportee_address)reportee_address as reportee_address_pos,reportee_address_val lateral view posexplode (reportee_occupation)reportee_occupation as reportee_occupation_pos,reportee_occupation_val lateral view posexplode(reportee_salary)reportee_salary as reportee_salary_pos,reportee_salary_val where reportee_name_pos<=>reportee_age_pos and reportee_age_pos<=>reportee_address_pos and reportee_address_pos<=>reportee_occupation_pos and reportee_occupation_pos<=>reportee_salary_pos; 
OK
101 richie  36  abc SE  5000
101 tom 28  xyz JSE 3000
Time taken: 0.175 seconds, Fetched: 2 row(s)

在上面的查询中,如果我删除位置等于条件(reportee_occupation_pos&lt; =&gt; reportee_salary_pos),结果将是笛卡儿。

预期输出

101,richie,36,abc,SE,5000
101,tom,28,xyz,JSE,3000
101,jack,29,NULL,SE

输出来自以下查询

select supervisorid,reportee_name_val,reportee_age_val,reportee_address_val,reportee_occupation_val,reportee_salary_val from manager lateral view posexplode(reportee_name)reportee_name  as reportee_name_pos,reportee_name_val lateral view posexplode(reportee_age)reportee_age as reportee_age_pos,reportee_age_val lateral view posexplode (reportee_address)reportee_address as reportee_address_pos,reportee_address_val lateral view posexplode (reportee_occupation)reportee_occupation as reportee_occupation_pos,reportee_occupation_val lateral view posexplode(reportee_salary)reportee_salary as reportee_salary_pos,reportee_salary_val where reportee_name_pos<=>reportee_age_pos and reportee_age_pos<=>reportee_address_pos and reportee_address_pos<=>reportee_occupation_pos and reportee_occupation_pos<=>reportee_salary_pos;


101,richie,36,abc,SE,5000
101,tom,28,xyz,JSE,3000

我期待每个数组的所有元素,如果任何数组中的元素数量较少,则它应该在横向视图中显示为NULL

任何帮助将不胜感激

0 个答案:

没有答案