我在hive中有两个巨大的表,我有一个运行这种类型的查询:示例查询:
select
employee_id,
employee_name,
employee_address,
employee_join_date
employee_travel_pincodes
from employee
where
employee_join_date = (select join-date from hr_records)
and
employee_travel_pincodes in (select _pincodes from hr_records) //returns multiple records
在hive中实现此目的的最佳方法是什么,我可以使用子查询,但看起来它不是实现所需输出的最干净的方法 我正在使用hive 0.13
答案 0 :(得分:0)
我建议使用JOIN以获得更好的性能:
SELECT
employee_id,
employee_name,
employee_address,
employee_join_date
employee_travel_pincodes
FROM employee e
INNER JOIN hr_records hr
ON e.employee_join_date = hr.join-date
AND e.employee_travel_pincodes = hr._pincodes
答案 1 :(得分:0)
您可以使用左半连接,可以在Hive中使用,代替:
select * from table1 where columnx in (select column x from table2);
这与左半连接相同:
select columnx from table1 a left outer join table2 b on a.columnx=b.columnx;
因此我会根据您所需查询的描述使用:
select
employee_id,
employee_name,
employee_address,
employee_join_date
employee_travel_pincodes
from employee a
left semi join hr_records b on a.employee_travel_pincodes=b._pincodes
where a.employee_join_date=b.join-date
;