如何在蜂巢查询中实现LEFT / RIGHT OUTER JOIN代替NOT IN?

时间:2018-12-11 09:39:07

标签: hive hiveql

我在Hive中有两个表:const params = { TableName: 'insiders', Key:{ "uuid": event.uuid, }, UpdateExpression: "SET #attrName = list_append(#attrName, :attrValue)", ExpressionAttributeNames: { "#attrName": "recommendations", }, ExpressionAttributeValues: { ":attrValue": [{ "uuid": `ir_${uuidv4()}`, "recommendation": event.recommendation }] }, ReturnValues:"ALL_NEW" }; empSrc

empTrg

我想找到> select * from empSrc; +---------------+--------------+-------------+--------------+--+ | empsrc.empid | empsrc.dept | empsrc.ph | empsrc.role | +---------------+--------------+-------------+--------------+--+ | e1 | dev | 9999911111 | SE | | e2 | admin | 6677889933 | SE | +---------------+--------------+-------------+--------------+--+ 2 rows selected (0.872 seconds) > select * from empTrg; +---------------+--------------+-------------+--------------+--------------------+----------------+--+ | emptrg.empid | emptrg.dept | emptrg.ph | emptrg.role | emptrg.dml_action | emptrg.active | +---------------+--------------+-------------+--------------+--------------------+----------------+--+ | e1 | dev | 9999911111 | SE | I | A | +---------------+--------------+-------------+--------------+--------------------+----------------+--+ empSrc 但缺少的记录。
我的查询工作正常:

empTrg

问题是此查询产生了交叉产品。
我可以使用任何等效的select S.* from empSrc S where S.empid not in (select T.empid from empTrg T); +----------+---------+-------------+---------+--+ | s.empid | s.dept | s.ph | s.role | +----------+---------+-------------+---------+--+ | e2 | admin | 6677889933 | SE | +----------+---------+-------------+---------+--+ 查询吗?
LEFT / RIGHT OUTER JOIN对性能有帮助吗?
上面的场景是一个演示场景,在实际数据中,我有大约1200万条记录。

1 个答案:

答案 0 :(得分:1)

查询select S.* from empSrc S where S.empid not in (select T.empid from empTrg T)实际上并不执行交叉联接。没问题。

可以使用not exists

复制相同的逻辑
select s.*
from empSrc s 
where not exists (select 1 from empTrg t where t.empid = s.empid)

left join

select s.*
from empSrc s
left join empTrg t on t.empid = s.empid
where t.empid is null --condition to check for non existent records