我已经训练了逻辑回归模型,并且想计算召回率@ 25。我使用的是Spark版本3.0.0和Scala 2.12.3。
val model = pipeline.fit(train)
val predicted = model.transform(test)
val predictionAndLabels = predicted.
select($"prediction",$"label")
.as[(Double, Double)]
val Arr = predictionAndLabels.rdd.map(x => (Array(x._1),Array(x._2)))
val matrix = new RankingMetrics(Arr)
Array(1, 25).foreach { k =>
println(s"Recall at $k = ${matrix.recallAt(k)}")
}
例外:
org.apache.spark.SparkException:任务在以下位置不可序列化 org.apache.spark.util.ClosureCleaner $ .ensureSerializable(ClosureCleaner.scala:396) 在 org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:386) 在 org.apache.spark.util.ClosureCleaner $ .clean(ClosureCleaner.scala:159) 在org.apache.spark.SparkContext.clean(SparkContext.scala:2358)
如何解决?