我有两个DataFrame。一个是MasterList,另一个是InsertList
MasterList:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 10|
+--------+--------+
InsertList:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 9|
+--------+--------+
在Scala中,如何加入两个DataFrame但只附加到新的DataFrame记录
WHERE MasterList.ttm_id = InsertList.ttm_id AND
MasterList.audit_id != InsertList.audit_id
-
ExpectedOutput:
+--------+--------+
| ttm_id|audit_id|
+--------+--------+
| 1| 10|
| 15| 10|
| 15| 9|
+--------+--------+
答案 0 :(得分:2)
您希望<{1}} >数据帧。这可以使用insertList
函数
masterList
您只需使用except
函数将 dataFrames 合并为
insertList.except(masterList)
你应该得到你想要的东西
union
答案 1 :(得分:1)
我在两列和NOT IN
union
)
val masterList = Seq((1, 10), (15, 10)).toDF("ttm_id", "audit_id")
val insertList = Seq((1, 10), (15, 9)).toDF("ttm_id", "audit_id")
insertList
.join(masterList, Seq("ttm_id", "audit_id"), "leftanti")
.union(masterList)
.show
// +------+--------+
// |ttm_id|audit_id|
// +------+--------+
// | 15| 9|
// | 1| 10|
// | 15| 10|
// +------+--------+