Question

我正在尝试使用gridSearch来找到模型的最佳超参数。

这是我的代码：

var featuresList = Array("Age","Gender","Qualifications")

val assembler = new VectorAssembler().setInputCols(featuresList_RF).setOutputCol("features")    
val randomForest = new RandomForestClassifier().setLabelCol("label").setFeaturesCol("features")

val pipeline_RF = new Pipeline().setStages(Array(assembler, randomForest))

val paramGrid_RF = new ParamGridBuilder().addGrid(randomForest.numTrees, Array(50, 100, 250, 500)).addGrid(randomForest.maxDepth, Array(5, 10, 15)).addGrid(randomForest.maxBins, Array(50, 100, 208)).addGrid(randomForest.minInstancesPerNode, Array(10, 50, 100)).build()

val RF = new CrossValidator().setEstimator(pipeline_RF).setEvaluator(new BinaryClassificationEvaluator).setEstimatorParamMaps(paramGrid_RF)
val model_RF = RF.fit(train) 
var predictions = model_RF.transform(test).select("probability", "prediction)

我创建一个包含2列的表格：

概率
预测

我了解：

如果概率<0.05，我的预测为0
如果它大于0.05，我得到1。

是否可以使用交叉验证通过使用除0.05以外的其他阈值来找到最佳模型？我的模型没有很好的功能，我的机率通常很低，也许0.05不是我的最佳选择...

Scala：如何通过交叉验证来修改阈值以定义其预测？

0 个答案: