如何使用列值等于选择其他查询来运行配置单元查询

时间:2015-07-10 14:01:23

标签: hive hiveql

我在hive中有两个巨大的表,我有一个运行这种类型的查询:示例查询:

    select
    employee_id,
    employee_name,
    employee_address,
    employee_join_date
    employee_travel_pincodes
    from employee
    where 
    employee_join_date = (select join-date from hr_records)
    and
    employee_travel_pincodes in (select _pincodes from hr_records) //returns     multiple records

在hive中实现此目的的最佳方法是什么,我可以使用子查询,但看起来它不是实现所需输出的最干净的方法 我正在使用hive 0.13

2 个答案:

答案 0 :(得分:0)

我建议使用JOIN以获得更好的性能:

SELECT 
    employee_id,
    employee_name,
    employee_address,
    employee_join_date
    employee_travel_pincodes
FROM employee e
INNER JOIN hr_records hr
    ON e.employee_join_date = hr.join-date
    AND e.employee_travel_pincodes = hr._pincodes

答案 1 :(得分:0)

您可以使用左半连接,可以在Hive中使用,代替:

select * from table1 where columnx in (select column x from table2);  

这与左半连接相同:

 select columnx from table1 a left outer join table2 b on a.columnx=b.columnx;

因此我会根据您所需查询的描述使用:

select
  employee_id,
  employee_name,
  employee_address,
  employee_join_date
  employee_travel_pincodes
from employee a
left semi join hr_records b on a.employee_travel_pincodes=b._pincodes
where a.employee_join_date=b.join-date
;