我想要使用以下查询从HIVE表中获取最新记录-
WITH lot as (select *
from to_burn_in as a where a.rel_lot='${Rel_Lot}')
select a.* from lot AS a
where not exists (select 1 from lot as b
where a.Rel_Lot=b.Rel_Lot and a.SerialNum=b.SerialNum and a.Test_Stage=b.Test_Stage
and cast(a.test_datetime as TIMESTAMP) < cast(b.Test_Datetime as TIMESTAMP))
order by a.SerialNum
此查询抛出错误
Error while compiling statement: FAILED: SemanticException line 0:undefined:-1 Unsupported SubQuery Expression 'Test_Datetime': SubQuery expression refers to both Parent and SubQuery expressions and is not a valid join condition.
我尝试用相等的运算符代替子查询中的小于运算符运行,并且运行良好。我阅读了HIVE文档,如 https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries 并且由于支持“ where”子查询而无法弄清为什么会引发错误。 这里可能是什么问题?
答案 0 :(得分:2)
EXISTS的工作原理实际上与联接相同。 Hive之前的Hive 2.2.0中不支持不相等的联接条件(请参见HIVE-15211,HIVE-15251)
似乎您正在尝试获取每个Rel_Lot,SerialNum,Test_Stage
具有最新时间戳的记录。您可以使用density_rank()或rank()函数重写查询:
WITH lot as (select *
from to_burn_in as a where a.rel_lot='${Rel_Lot}'
)
select * from
(
select a.*,
dense_rank() over(partition by Rel_Lot,SerialNum,Test_Stage order by cast(a.test_datetime as TIMESTAMP) desc) as rnk
from lot AS a
)s
where rnk=1
order by s.SerialNum