如何在CrossValidatorModel中访问每个折叠的计算指标

时间:2016-08-17 09:02:42

标签: apache-spark apache-spark-ml

如何从/web/static/src/img/icons//中的<button name="gen_link" type="object" icon="/../../../../../../custom_module/static/src/img/image"/> 获取每个折叠的计算指标?我知道我可以使用CrossValidatorModel获得平均指标但是有可能在每个折叠上获得原始结果以查看例如。结果的方差?

我使用的是Spark 2.0.0。

1 个答案:

答案 0 :(得分:0)

学习https://www.reportsbuyer.com/?/Home

对于折叠,您可以自己进行迭代:

    val splits = MLUtils.kFold(dataset.toDF.rdd, $(numFolds), $(seed))
    //K-folding operation starting
    //for each fold you have multiple models created cfm. the paramgrid
    splits.zipWithIndex.foreach { case ((training, validation), splitIndex) =>
      val trainingDataset = sparkSession.createDataFrame(training, schema).cache()
      val validationDataset = sparkSession.createDataFrame(validation, schema).cache()


      val models = est.fit(trainingDataset, epm).asInstanceOf[Seq[Model[_]]]
      trainingDataset.unpersist()
      var i = 0
      while (i < numModels) {
        val metric = eval.evaluate(models(i).transform(validationDataset, epm(i)))
        logDebug(s"Got metric $metric for model trained with ${epm(i)}.")
        metrics(i) += metric
        i += 1
      }

这是scala,但这些想法非常明确。

查看概述每次折扣结果的spark code here。希望这会有所帮助。