比较Scala中的数据框,并将不匹配的旧列和新列写入新数据框

时间:2018-10-11 16:31:18

标签: scala apache-spark

我有两个df

df1
ID | BTH_DT | CDC_FLAG | CDC_TS | CNSM_ID
123 | 1986-10-07 | I | 2018-10-10 05:51:24.000000941 | 301634310
124 | 1973-02-15 | I | 2018-10-10 17:12:22.000000254 | 298910234

df2
ID | BTH_DT | CDC_FLAG | CDC_TS | CNSM_ID
123 | 1986-10-07 | I | 2018-10-10 05:51:24.000000941 | \ c
124 | 1973-02-15 | I | 2018-10-10 17:12:22.000000254 | 298910234

如何比较两个df并将不匹配的列单独写入不同的df?

ID | CNSM_ID
123 | 301634310
123 | \\ c

df2.except(df1)  

没有达到目的

1 个答案:

答案 0 :(得分:0)

怎么样

val diff1=df1.except(df2)
val diff2=df2.except(df1)
val join=diff1.unionAll(diff2)

然后join.select("id","CNSM_ID")