为什么加权召回率和准确性值完全相同?

时间:2018-12-09 18:16:24

标签: java algorithm apache-spark machine-learning

我使用Apache Spark MLLib库实现了一些机器学习算法。我正在使用MulticlassClassificationEvaluator对象获取结果。我想要获得的结果是精度,召回率和准确性。

问题在于准确性和召回率对于我使用的所有算法都是相同的。例如,对于随机森林,准确性和召回率值是98%,对于朴素贝叶斯算法,则是95%。我使用的其他算法的情况也相同。这正常吗?它与我获得结果的方式有关吗?

这是我使用的一些实现。随机森林:

<div>
  <div class="left">
    <mat-list class="filter-list" *ngFor="let f of filterList" (click)="onSelect(f)">
      <span>
          {{f.id}}
        </span>
      <span>
          {{f.name}}
        </span>
    </mat-list>
  </div>
  <div class="right">
    <ng-container *ngIf="selectedFilter">
      {{selectedFilter | json}}
      <mat-list *ngFor="let tag of selectedFilter.tags">
        <span>
          {{tag.id}}
        </span> here
      </mat-list>
    </ng-container>

  </div>
</div>

朴素贝叶斯算法:

    Dataset<Row> dataFrame              = sparkBase
            .getSpark()
            .read()
            .format("libsvm")
            .load(svFilePath);
    StringIndexerModel labelIndexer     = new StringIndexer()
            .setInputCol("label")
            .setOutputCol("indexedLabel")
            .fit(dataFrame);
    VectorIndexerModel featureIndexer   = new VectorIndexer()
            .setInputCol("features")
            .setOutputCol("indexedFeatures")
            .setMaxCategories(categoryCount)
            .fit(dataFrame);

    Dataset<Row>[] splits = dataFrame.randomSplit(new double[]
                {mainController.getTrainingDataRate(), mainController.getTestDataRate()}, 1234L);
        Dataset<Row> train = splits[0];
        Dataset<Row> test = splits[1];

        RandomForestClassifier rf = new RandomForestClassifier()
                .setLabelCol("indexedLabel")
                .setFeaturesCol("indexedFeatures");

        IndexToString labelConverter = new IndexToString()
                .setInputCol("prediction")
                .setOutputCol("predictedLabel")
                .setLabels(labelIndexer.labels());

        Pipeline pipeline = new Pipeline()
                .setStages(new PipelineStage[] {labelIndexer, featureIndexer, rf, labelConverter});

        PipelineModel model = pipeline.fit(train);

        Dataset<Row> predictions = model.transform(test);
        MulticlassClassificationEvaluator evaluator = new MulticlassClassificationEvaluator()
                .setLabelCol("indexedLabel")
                .setPredictionCol("prediction")
                .setMetricName("accuracy");
        accuracy = evaluator.evaluate(predictions);

        evaluator.setMetricName("weightedRecall");
        recall = (evaluator.evaluate(predictions));

        evaluator.setMetricName("weightedPrecision");
        precision = (evaluator.evaluate(predictions));

我做错什么了吗?问候

0 个答案:

没有答案