我正在尝试使用GeneralizedLinearRegression计算pValue并得到以下异常。
val assembler = new VectorAssembler()
.setInputCols(final_columns)
.setOutputCol("Feature")
val glr = new GeneralizedLinearRegression()
.setFamily("binomial")
.setLink("logit")
.setMaxIter(1)
.setRegParam(0.0)
.setFeaturesCol("Feature")
.setLabelCol("LM_2")
//.setSolver("auto")
val pipeline = new Pipeline().setStages(Array(assembler,glr))
val lrModel_general = pipeline.fit(indexedDF)
val sum = lrModel_general.stages.last.asInstanceOf[GeneralizedLinearRegressionModel].summary.pValues
Exception in thread "main" java.lang.UnsupportedOperationException: No p-value available for this GeneralizedLinearRegressionModel
at org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary.pValues$lzycompute(GeneralizedLinearRegression.scala:1480)
at org.apache.spark.ml.regression.GeneralizedLinearRegressionTrainingSummary.pValues(GeneralizedLinearRegression.scala:1468)
at com.cvs.scala.ml.model.LR_SqlDB_LocalMessageGrouping$.main(LR_SqlDB_LocalMessageGrouping.scala:172)
at com.cvs.scala.ml.model.LR_SqlDB_LocalMessageGrouping.main(LR_SqlDB_LocalMessageGrouping.scala)
答案 0 :(得分:0)
好吧,首先绝对是关于统计的信息,因此请考虑阅读this answer。
对于您在Spark中的解决方案,我建议您检查模型的类别,并避免给出Ridge模型的摘要,因为对于这种模型几乎没有用。