我正在尝试将多个子查询运行到where子句中,我得到以下错误。是否意味着Hive不支持它?如果没有,是否有不同的方式来编写下面的查询?
执行配置单元查询时出错:OK FAILED:SemanticException [错误10249]:第14行不支持的子查询表达式'adh':仅支持1个SubQuery表达式。
select
first_name,
last_name,
salary,
title,
department
from
employee_t1 emp
where
emp.salary <= 100000
and (
(emp.code in (select comp from history_t2 where code_hist <> 10))
or
(emp.adh in (select comp from sector_t3 where code_hist <> 50))
)
and department = 'Pediatrics';
答案 0 :(得分:0)
两个选项。一个是join
s,另一个是union all
:
where emp.salary <= 100000 and
emp.code in (select comp
from history_t2
where code_hist <> 10
union all
select comp
from sector_t3
where code_hist <> 50
) and
emp.department = 'Pediatrics';
通常不推荐这样做,因为优化选项较少。但是如果Hive有这个限制(我没有在Hive中尝试过这种类型的查询),那么这可能是解决它的一种方法。
如果join
字段在两个表中是唯一的,comp
方法最合适。否则,您需要删除重复项以避免join
中的重复。
答案 1 :(得分:0)
我同意戈登的观点。使用联接你可以尝试下面的查询(未测试):
select
a.first_name,
a.last_name,
a.salary,
a.title,
a.department
from
(Select * from employee_t1 where
emp.salary <= 100000
and department = 'Pediatrics') a
left outer join (select comp from history_t2 where code_hist <> 10) b
on a.code = b.comp
left outer join (select comp from sector_t3 where code_hist <> 50) c
on a.adh = c.comp
where b.comp is not null
or c.comp is not null
;
答案 2 :(得分:0)
只需在此处添加一点注释即可。错误消息指出配置单元仅支持1个子查询。实际上,这与蜂巢具有的限制有关:“单个查询仅支持一个子查询表达式”。
答案 3 :(得分:0)
这正是left semi join
的用途:
select
distinct main.*
from
(
select
emp.first_name,
emp.last_name,
emp.salary,
emp.title,
emp.department
from
employee_t1 emp
left semi join
(select distinct comp from history_t2 where code_hist <> 10) emp_code on emp_code.comp=emp.code
where
emp.salary <= 100000 and emp.department = 'Pediatrics'
union all
select
emp.first_name,
emp.last_name,
emp.salary,
emp.title,
emp.department
from
employee_t1 emp
left semi join
(select distinct comp from sector_t3 where code_hist <> 50) emp_adh on emp_adh.comp=emp.adh
where
emp.salary <= 100000 and emp.department = 'Pediatrics'
) main
参考:https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins