Question

我是Spark的新手。我在Windows上安装了PySpark 2.3.0。我正在开发一个包含3个类的数据集：“Positive”，“Negative”，“Neutral”。

我想使用LinearSVC进行交叉验证，但是为了评估，我想使用“正”和“负”两个类的平均F1分数来评估每个模型。

在以下代码中：

我认为，通过在“MulticlassClassificationEvaluator”中选择“metricName”为“f1”，最佳参数是根据3个类别的平均F1分数选择的。但正如我之前所说，我想要选择仅基于2个班级的平均分数。

from pyspark.ml.tuning import CrossValidator, ParamGridBuilder
from pyspark.ml.classification import LinearSVC
from pyspark.ml.evaluation import MulticlassClassificationEvaluator

LSVC = LinearSVC()
paramGrid = ParamGridBuilder().addGrid(LSVC.maxIter, [10, 100, 1000]).addGrid(LSVC.regParam, [0.01,0.1,10.0,100.0]).build()
crossval = CrossValidator(estimator=LSVC,
                      #estimator=pipeline,
                      estimatorParamMaps=paramGrid,
                      evaluator=MulticlassClassificationEvaluator(metricName="f1"),
                      numFolds=2)

Pyspark：基于用户定义的指标的交叉验证中的评估

0 个答案: