Spark - MLlib从LogisticRegressionWithLBFGS

时间:2016-03-16 01:27:42

标签: scala apache-spark logistic-regression apache-spark-mllib

我正在使用Apache-Spark执行逻辑回归w / LBFGS。我正在尝试生成Learning Curves以查看我的模型是否存在高偏差或高偏差。

Andrew Ng从他的Lecture on Learning Curves讨论了Machine Learning Coursera Course中学习曲线的有用性。为此,我需要获得优化函数的损失AKA成本AKA误差。

Apache src在LBFGS.scala

中包含以下内容
@DeveloperApi
object LBFGS extends Logging {

  ... some code

  def runLBFGS(...some params...): (Vector, Array[Double]) = {
      val lossHistory = mutable.ArrayBuilder.make[Double]

      ... more code

      var state = states.next()
      while (states.hasNext) {
        lossHistory += state.value
        state = states.next()
      }
      lossHistory += state.value
      val lossHistoryArray = lossHistory.result()
      logInfo("LBFGS.runLBFGS finished. Last 10 losses %s".format(
        lossHistoryArray.takeRight(10).mkString(", ")))
      (weights, lossHistoryArray)
  }

我可以在日志中看到这些权重,但我不确定如何使用LogisticRegressionWithLBFGS().run()

以编程方式获取它们

我的尝试:

val (model, lossHistoryArray) = new LogisticRegressionWithLBFGS()
  .setNumClasses(2)
  .run(learningSample)

然而,我收到错误:

  

构造函数无法实例化为期望的类型;   [错误]发现:(T1,T2)   [error] required:org.apache.spark.mllib.classification.LogisticRegressionModel

原因很明显。但是,我不确定如何获取我需要的信息,因为它似乎嵌套在API中。

0 个答案:

没有答案