Scala - 如果DF1与DF2的列具有匹配的数据,则从DF1中删除记录

时间:2018-05-03 10:53:36

标签: scala apache-spark

我有两个DF' railroadGreaterFilerailroadInputFile

如果来自railroadGreaterFile的{​​{1}}列中的数据与来自MEMBER_NUM

railroadGreaterFile列中的数据相匹配,我想从MEMBER_NUM中删除记录

以下是我使用的内容:

railroadInputFile

执行上述操作时,会删除记录,但我目睹val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check")) val railroadGreaterNotInput = railroadGreaterFile .join(columnrailroadInputFile, Seq("MEMBER_NUM"), "left") .filter($"check".isNull) .drop($"check") 的架构是我的railroadGreaterNotInputDF1的组合,所以当我尝试编写{{1}时要提交的数据,它会给我以下错误

DF2

我应该怎么做,以便railroadGreaterNotInput只包含来自org.apache.spark.sql.AnalysisException: Reference 'GROUP_NUM' is ambiguous, could be: GROUP_NUM#508, GROUP_NUM#72 DF的字段?

1 个答案:

答案 0 :(得分:2)

您只能在加入时选择MEMBER_NUM

val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check"))
val railroadGreaterNotInput = railroadGreaterFile.join(
    columnrailroadInputFile.select("MEMBER_NUM", "check"), Seq("MEMBER_NUM"), "left")
   .filter($"check".isNull).drop($"check")

或删除columnrailroadInputFile中的所有列

columnrailroadInputFile.drop(columnrailroadInputFile.columns :_*)

但为此,请将加号加入

columnrailroadInputFile("MEMBER_NUM") === railroadInputFile("MEMBER_NUM")

希望这有帮助!