我有两个DF' railroadGreaterFile
,railroadInputFile
。
如果来自railroadGreaterFile
的{{1}}列中的数据与来自MEMBER_NUM
railroadGreaterFile
列中的数据相匹配,我想从MEMBER_NUM
中删除记录
以下是我使用的内容:
railroadInputFile
执行上述操作时,会删除记录,但我目睹val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check"))
val railroadGreaterNotInput = railroadGreaterFile
.join(columnrailroadInputFile, Seq("MEMBER_NUM"), "left")
.filter($"check".isNull)
.drop($"check")
的架构是我的railroadGreaterNotInput
和DF1
的组合,所以当我尝试编写{{1}时要提交的数据,它会给我以下错误
DF2
我应该怎么做,以便railroadGreaterNotInput
只包含来自org.apache.spark.sql.AnalysisException: Reference 'GROUP_NUM' is ambiguous, could be: GROUP_NUM#508, GROUP_NUM#72
DF的字段?
答案 0 :(得分:2)
您只能在加入时选择MEMBER_NUM
val columnrailroadInputFile = railroadInputFile.withColumn("check", lit("check"))
val railroadGreaterNotInput = railroadGreaterFile.join(
columnrailroadInputFile.select("MEMBER_NUM", "check"), Seq("MEMBER_NUM"), "left")
.filter($"check".isNull).drop($"check")
或删除columnrailroadInputFile
中的所有列
columnrailroadInputFile.drop(columnrailroadInputFile.columns :_*)
但为此,请将加号加入
columnrailroadInputFile("MEMBER_NUM") === railroadInputFile("MEMBER_NUM")
希望这有帮助!