嗨我有一个使用hive数组爆炸的场景
我有一个包含多个数组(4个数组)的hive表
desc manager;
OK
supervisorid int
reportee_name array<string>
reportee_age array<string>
reportee_address array<string>
reportee_occupation array<string>
reportee_salary array<string>
以上4个数组是相关的(就像每个数组的第n个元素映射到其他数组的第n个元素。
但是在某些情况下,当其他相关数组的n元素具有值时,我可能没有为特定元素数组的n元素获取值。 实施例
select * from manager;
OK
101 ["richie","tom","jack"] ["36","28","29"] ["abc","xyz"] ["SE","JSE","SE"] ["5000","3000","6000"]
当所有数组列reportee_name,reportee_age,reportee_occupation,reportee_salary有3个elemnts而reportee_address只有2个elemnts
因此,当我做posexplode横向视图时,我将只获得2个元素
select supervisorid,reportee_name_val,reportee_age_val,reportee_address_val,reportee_occupation_val,reportee_salary_val from manager lateral view posexplode(reportee_name)reportee_name as reportee_name_pos,reportee_name_val lateral view posexplode(reportee_age)reportee_age as reportee_age_pos,reportee_age_val lateral view posexplode (reportee_address)reportee_address as reportee_address_pos,reportee_address_val lateral view posexplode (reportee_occupation)reportee_occupation as reportee_occupation_pos,reportee_occupation_val lateral view posexplode(reportee_salary)reportee_salary as reportee_salary_pos,reportee_salary_val where reportee_name_pos<=>reportee_age_pos and reportee_age_pos<=>reportee_address_pos and reportee_address_pos<=>reportee_occupation_pos and reportee_occupation_pos<=>reportee_salary_pos;
OK
101 richie 36 abc SE 5000
101 tom 28 xyz JSE 3000
Time taken: 0.175 seconds, Fetched: 2 row(s)
在上面的查询中,如果我删除位置等于条件(reportee_occupation_pos&lt; =&gt; reportee_salary_pos),结果将是笛卡儿。
预期输出
101,richie,36,abc,SE,5000
101,tom,28,xyz,JSE,3000
101,jack,29,NULL,SE
输出来自以下查询
select supervisorid,reportee_name_val,reportee_age_val,reportee_address_val,reportee_occupation_val,reportee_salary_val from manager lateral view posexplode(reportee_name)reportee_name as reportee_name_pos,reportee_name_val lateral view posexplode(reportee_age)reportee_age as reportee_age_pos,reportee_age_val lateral view posexplode (reportee_address)reportee_address as reportee_address_pos,reportee_address_val lateral view posexplode (reportee_occupation)reportee_occupation as reportee_occupation_pos,reportee_occupation_val lateral view posexplode(reportee_salary)reportee_salary as reportee_salary_pos,reportee_salary_val where reportee_name_pos<=>reportee_age_pos and reportee_age_pos<=>reportee_address_pos and reportee_address_pos<=>reportee_occupation_pos and reportee_occupation_pos<=>reportee_salary_pos;
101,richie,36,abc,SE,5000
101,tom,28,xyz,JSE,3000
我期待每个数组的所有元素,如果任何数组中的元素数量较少,则它应该在横向视图中显示为NULL
任何帮助将不胜感激