蜂巢:小于子查询中的运算符错误

时间:2018-12-17 09:47:41

标签: sql hive hiveql

我想要使用以下查询从HIVE表中获取最新记录-

WITH lot as (select *
from  to_burn_in as a where a.rel_lot='${Rel_Lot}')
select a.* from lot AS a
where not exists (select 1 from lot as b 
where a.Rel_Lot=b.Rel_Lot and a.SerialNum=b.SerialNum and a.Test_Stage=b.Test_Stage 
and cast(a.test_datetime as TIMESTAMP) < cast(b.Test_Datetime as TIMESTAMP))
order by a.SerialNum

此查询抛出错误

Error while compiling statement: FAILED: SemanticException line 0:undefined:-1 Unsupported SubQuery Expression 'Test_Datetime': SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition.

我尝试用相等的运算符代替子查询中的小于运算符运行,并且运行良好。我阅读了HIVE文档,如 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries 并且由于支持“ where”子查询而无法弄清为什么会引发错误。 这里可能是什么问题?

1 个答案:

答案 0 :(得分:2)

EXISTS的工作原理实际上与联接相同。 Hive之前的Hive 2.2.0中不支持不相等的联接条件(请参见HIVE-15211HIVE-15251

似乎您正在尝试获取每个Rel_Lot,SerialNum,Test_Stage具有最新时间戳的记录。您可以使用density_rank()或rank()函数重写查询:

WITH lot as (select *
from  to_burn_in as a where a.rel_lot='${Rel_Lot}'
)

select * from 
(
select a.*,
       dense_rank() over(partition by Rel_Lot,SerialNum,Test_Stage order by cast(a.test_datetime as TIMESTAMP) desc) as rnk
  from lot AS a
)s 
where rnk=1
order by s.SerialNum