将Scala / Spark数据框符合条件筛选为另一个数据框

时间:2020-04-23 02:44:44

标签: scala dataframe apache-spark

我根据具有多种条件的另一个数据框(ruleDf)筛选数据框(dataDf)

我尝试了这段代码,但是它不起作用

dataDF.filter(dataDF(“ App”)=== ruleDF“ app” && dataDF(“ values”)> ruleDF“ valuee”)。select($“ no”,$“ App”,$“ values” ).show(true)

1 个答案:

答案 0 :(得分:-1)

您需要 Join (根据您的要求)这两个数据框,然后仅获取所需的列。

Example:

val dataDF=Seq(("1","a","c")).toDF("no","App","values")
val ruleDF=Seq(("1","a","b")).toDF("no","App","values")
//inner join
val join_df=dataDF.alias("a").join(ruleDF.alias("b"),(col("a.App") === col("b.App")) &&(col("a.values") > col("b.values"))).select("a.*")

join_df.show()
//+---+---+------+
//| no|App|values|
//+---+---+------+
//|  1|  a|     c|
//+---+---+------+