我们正在运行一个RandomForest模型,它创建了3个分类器,我们想要计算AUC用于评估我们的模型,而不是使用精度
如果我们使用spark.ml会有一种方法吗?目前,我们调用MulticlassClassificationEvaluator并使用度量准确度。在列表中,它没有auc作为其中的一部分,但只是以下内容: 指标:
* param for metric name in evaluation (supports `"f1"` (default), `"weightedPrecision"`,* `"weightedRecall"`, `"accuracy"`)
想知道是否有关于如何为火花计算AUC的例子?
我们正在运行Spark 2.0,这是我们使用精度指标进行评估的当前设置
max_depth = model_params['max_depth']
num_trees = model_params['num_trees']
# Train a RandomForest model.
rf = RandomForestClassifier(labelCol="label", featuresCol="features", impurity = "gini",
featureSubsetStrategy="all", numTrees = num_trees, maxDepth = max_depth)
# Train model. This model fit is used for scoring future packages later.
model_fit = rf.fit(training_data)
# Make predictions.
transformed = model_fit.transform(test_data)
# Calculate and show the confusion matrix on test data if indicated
if model_params['calc_matrix'] is True:
# Select (prediction, true label) and compute test error
evaluator = MulticlassClassificationEvaluator(labelCol="label",
predictionCol="prediction", metricName="accuracy")
accuracy = evaluator.evaluate(transformed)
print("RF Overall Accuracy = {}, numTrees = {}, maxDepth = {}".
format(accuracy, num_trees, max_depth))
答案 0 :(得分:1)
曲线下面积(AUC)仅对二元分类器有意义,但您使用的是MulticlassClassificationEvaluator(这意味着输出类的数量> 2)
检查BinaryClassificationEvaluator
但是,如果您想构建多类分类器,则需要多类精度