具有多个子查询的Hive

时间:2015-11-24 17:18:15

标签: sql hive hiveql

我正在尝试将多个子查询运行到where子句中,我得到以下错误。是否意味着Hive不支持它?如果没有,是否有不同的方式来编写下面的查询?

执行配置单元查询时出错:OK FAILED:SemanticException [错误10249]:第14行不支持的子查询表达式'adh':仅支持1个SubQuery表达式。

select
    first_name, 
    last_name,
    salary,
    title,
    department
from 
    employee_t1 emp
where 
    emp.salary <= 100000
    and (
        (emp.code in (select comp from history_t2 where code_hist <> 10))
        or 
        (emp.adh in (select comp from sector_t3 where code_hist <> 50))
    ) 
    and department = 'Pediatrics';

4 个答案:

答案 0 :(得分:0)

两个选项。一个是join s,另一个是union all

where emp.salary <= 100000 and
      emp.code in (select comp
                   from history_t2 
                   where code_hist <> 10
                   union all
                   select comp
                   from sector_t3
                   where code_hist <> 50
                  ) and
      emp.department = 'Pediatrics';

通常不推荐这样做,因为优化选项较少。但是如果Hive有这个限制(我没有在Hive中尝试过这种类型的查询),那么这可能是解决它的一种方法。

如果join字段在两个表中是唯一的,comp方法最合适。否则,您需要删除重复项以避免join中的重复。

答案 1 :(得分:0)

我同意戈登的观点。使用联接你可以尝试下面的查询(未测试):

 select
    a.first_name, 
    a.last_name,
    a.salary,
    a.title,
    a.department
from 
    (Select * from employee_t1 where 
    emp.salary <= 100000
    and department = 'Pediatrics') a
left outer join (select comp from history_t2 where code_hist <> 10) b
on a.code = b.comp   
left outer join  (select comp from sector_t3 where code_hist <> 50) c
on a.adh = c.comp
where b.comp is not null
or    c.comp is not null
;

答案 2 :(得分:0)

只需在此处添加一点注释即可。错误消息指出配置单元仅支持1个子查询。实际上,这与蜂巢具有的限制有关:“单个查询仅支持一个子查询表达式”。

您可以在此处参考官方文档。 https://docs.hortonworks.com/HDPDocuments/HDP2/HDP-2.6.5/bk_data-access/content/hive-013-feature-subqueries-in-where-clauses.html

答案 3 :(得分:0)

这正是left semi join的用途:

select
    distinct main.*
from
(
select
    emp.first_name, 
    emp.last_name,
    emp.salary,
    emp.title,
    emp.department
from 
    employee_t1 emp
left semi join 
        (select distinct comp from history_t2 where code_hist <> 10) emp_code on emp_code.comp=emp.code
where 
    emp.salary <= 100000 and emp.department = 'Pediatrics'
union all
select
    emp.first_name, 
    emp.last_name,
    emp.salary,
    emp.title,
    emp.department
from 
    employee_t1 emp
left semi join 
        (select distinct comp from sector_t3 where code_hist <> 50) emp_adh on emp_adh.comp=emp.adh
where 
    emp.salary <= 100000 and emp.department = 'Pediatrics'
) main

参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins