我在Hive中有两个表:const params = {
TableName: 'insiders',
Key:{
"uuid": event.uuid,
},
UpdateExpression: "SET #attrName = list_append(#attrName, :attrValue)",
ExpressionAttributeNames: {
"#attrName": "recommendations",
},
ExpressionAttributeValues: {
":attrValue": [{
"uuid": `ir_${uuidv4()}`,
"recommendation": event.recommendation
}]
},
ReturnValues:"ALL_NEW"
};
和empSrc
:
empTrg
我想找到> select * from empSrc;
+---------------+--------------+-------------+--------------+--+
| empsrc.empid | empsrc.dept | empsrc.ph | empsrc.role |
+---------------+--------------+-------------+--------------+--+
| e1 | dev | 9999911111 | SE |
| e2 | admin | 6677889933 | SE |
+---------------+--------------+-------------+--------------+--+
2 rows selected (0.872 seconds)
> select * from empTrg;
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
| emptrg.empid | emptrg.dept | emptrg.ph | emptrg.role | emptrg.dml_action | emptrg.active |
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
| e1 | dev | 9999911111 | SE | I | A |
+---------------+--------------+-------------+--------------+--------------------+----------------+--+
中empSrc
但缺少的记录。
我的查询工作正常:
empTrg
问题是此查询产生了交叉产品。
我可以使用任何等效的select S.* from empSrc S
where S.empid not in (select T.empid from empTrg T);
+----------+---------+-------------+---------+--+
| s.empid | s.dept | s.ph | s.role |
+----------+---------+-------------+---------+--+
| e2 | admin | 6677889933 | SE |
+----------+---------+-------------+---------+--+
查询吗?
LEFT / RIGHT OUTER JOIN对性能有帮助吗?
上面的场景是一个演示场景,在实际数据中,我有大约1200万条记录。
答案 0 :(得分:1)
查询select S.* from empSrc S
where S.empid not in (select T.empid from empTrg T)
实际上并不执行交叉联接。没问题。
可以使用not exists
select s.*
from empSrc s
where not exists (select 1 from empTrg t where t.empid = s.empid)
或left join
。
select s.*
from empSrc s
left join empTrg t on t.empid = s.empid
where t.empid is null --condition to check for non existent records