案例1合并
旧数据框:
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ab| ac|
## | 2| bb| bc| bd|
## +---+----+----+---+
新数据框:
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ab| ad|
## | 2| bb| bb| bd|
## | 3| cc| cc| cc|
## +---+----+----+---+
结果:
## +---+----+----+---+
## |pk1|pk2|val1|val2|
## +---+----+----+---+
## | 1| aa| ab| ad|
## | 2| bb| bb| bd|
## | 3| cc| cc| cc|
## +---+----+----+---+
具有多个键的外连接是否有效?
答案 0 :(得分:1)
根据您的示例数据,我认为新数据框中的元素将在旧数据框中被选取,如果它们不同的话。
[更新]如果val列是动态的,您可以将foldLeft
应用于列列表,如下所示:
val dfOld = Seq(
(1, "aa", "ab", "ac"),
(2, "bb", "bc", "bd")
).toDF("pk1", "pk2", "val1", "val2")
val dfNew = Seq(
(1, "aa", "ab", "ad"),
(2, "bb", "bb", "bd"),
(3, "cc", "cc", "cc")
).toDF("pk1", "pk2", "val1", "val2")
// Assemble the list of selected val-columns
val valColumns = dfNew.columns.filter(x => x != "pk1" && x != "pk2")
val dfJoined = dfNew.join(dfOld, Seq("pk1", "pk2"), "left_outer")
// Generate diff-columns from the val-column list
val dfDiff = valColumns.foldLeft(dfJoined)( (acc, x ) =>
acc.withColumn(
x + "diff",
when( !(dfNew(x) === dfOld(x)) || (dfOld(x).isNull), dfNew(x) ).otherwise( null )
).
drop(x)
)
dfDiff.show
+---+---+--------+--------+
|pk1|pk2|val1diff|val2diff|
+---+---+--------+--------+
| 1| aa| null| ad|
| 2| bb| bb| null|
| 3| cc| cc| cc|
+---+---+--------+--------+