如何使用Spark Scala从GeneralizedLinearRegressionModel计算pValue

时间:2018-12-28 14:59:14

标签: scala apache-spark data-science apache-spark-mllib

我正在尝试使用GeneralizedLinearRegression计算pValue并得到以下异常。

    val assembler = new VectorAssembler()
      .setInputCols(final_columns)
      .setOutputCol("Feature")

val glr = new GeneralizedLinearRegression()
      .setFamily("binomial")
      .setLink("logit")
      .setMaxIter(1)
      .setRegParam(0.0)
      .setFeaturesCol("Feature")
      .setLabelCol("LM_2")
      //.setSolver("auto")

    val pipeline = new Pipeline().setStages(Array(assembler,glr))
    val lrModel_general = pipeline.fit(indexedDF)
    val sum = lrModel_general.stages.last.asInstanceOf[GeneralizedLinearRegressionModel].summary.pValues

Exception in thread "main" java.lang.UnsupportedOperationException: No p-value available for this GeneralizedLinearRegressionModel
at org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary.pValues$lzycompute(GeneralizedLinearRegression.scala:1480)
at org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary.pValues(GeneralizedLinearRegression.scala:1468)
at com.cvs.scala.ml.model.LR_SqlDB_LocalMessageGrouping$.main(LR_SqlDB_LocalMessageGrouping.scala:172)
at com.cvs.scala.ml.model.LR_SqlDB_LocalMessageGrouping.main(LR_SqlDB_LocalMessageGrouping.scala)

1 个答案:

答案 0 :(得分:0)

好吧,首先绝对是关于统计的信息,因此请考虑阅读this answer

对于您在Spark中的解决方案,我建议您检查模型的类别,并避免给出Ridge模型的摘要,因为对于这种模型几乎没有用。