我正在使用Apache-Spark
执行逻辑回归w / LBFGS。我正在尝试生成Learning Curves以查看我的模型是否存在高偏差或高偏差。
Andrew Ng从他的Lecture on Learning Curves讨论了Machine Learning Coursera Course中学习曲线的有用性。为此,我需要获得优化函数的损失AKA成本AKA误差。
Apache src在LBFGS.scala
中包含以下内容@DeveloperApi
object LBFGS extends Logging {
... some code
def runLBFGS(...some params...): (Vector, Array[Double]) = {
val lossHistory = mutable.ArrayBuilder.make[Double]
... more code
var state = states.next()
while (states.hasNext) {
lossHistory += state.value
state = states.next()
}
lossHistory += state.value
val lossHistoryArray = lossHistory.result()
logInfo("LBFGS.runLBFGS finished. Last 10 losses %s".format(
lossHistoryArray.takeRight(10).mkString(", ")))
(weights, lossHistoryArray)
}
我可以在日志中看到这些权重,但我不确定如何使用LogisticRegressionWithLBFGS().run()
我的尝试:
val (model, lossHistoryArray) = new LogisticRegressionWithLBFGS()
.setNumClasses(2)
.run(learningSample)
然而,我收到错误:
构造函数无法实例化为期望的类型; [错误]发现:(T1,T2) [error] required:org.apache.spark.mllib.classification.LogisticRegressionModel
原因很明显。但是,我不确定如何获取我需要的信息,因为它似乎嵌套在API中。