Pyspark ML:如何使用CrossValidator()获得子模型值

时间:2019-07-12 22:46:58

标签: apache-spark pyspark k-fold

我想使用'<h1>Not Found</h1><p>The requested resource was not found on this server.</p>'cross-validation库来获得PySpark的(内部)训练准确性:

ML

为了获取每个lr = LogisticRegression() param_grid = (ParamGridBuilder() .addGrid(lr.regParam, [0.01, 0.5]) .addGrid(lr.maxIter, [5, 10]) .addGrid(lr.elasticNetParam, [0.01, 0.1]) .build()) evaluator = MulticlassClassificationEvaluator(predictionCol='prediction') cv = CrossValidator(estimator=lr, estimatorParamMaps=param_grid, evaluator=evaluator, numFolds=5) model_cv = cv.fit(train) predictions_lr = model_cv.transform(validation) predictions = evaluator.evaluate(predictions_lr) 文件夹的准确性指标,我尝试:

c.v.

,但是此方法的结果为空(print(model_cv.subModels))。

如何获取每个文件夹的None

1 个答案:

答案 0 :(得分:1)

我知道这已经很老了,但万一有人在交叉验证过程中寻找火花保存非最佳模型,我需要在创建CrossValidator时启用子模型集合。只需将值设置为True(默认情况下为False)即可。

CrossValidator(estimator=lr, 
               estimatorParamMaps=param_grid, 
               evaluator=evaluator, 
               numFolds=5,
               collectSubModels=True)