这是我的功能:
def TestForeach(dataFrame: DataFrame)={
dataFrame.select("user_id").dropDuplicates().foreach(row =>{
dataFrame.filter("user_id == "+row.getString(0))
})
}
我收到此错误:
ERROR Executor: Exception in task 2.0 in stage 4.0 (TID 16)
java.lang.NullPointerException
at org.apache.spark.sql.Dataset.filter(Dataset.scala:1318)
如何使用相同的user_id获取Dataframe?
答案 0 :(得分:2)
foreach
对执行程序执行操作,而dataFrame
仅在驱动程序上可用。
您应collect
之前foreach
。随着更改,foreach
就是Scala不是Spark的foreach
。
答案 1 :(得分:0)
您无法在转化或操作中使用dataFrame。您需要先收集用户ID:
def testForeach(dataFrame: DataFrame): Seq[DataFrame] = {
val userIds: Array[String] = dataFrame.select("user_id").distinct.map(_.getString(0)).collect
userIds.map(uid => dataFrame.filter($"user_id" === uid)).toSeq
}