是否可以为scikit-learn集成分类器设置“阈值”?

时间:2018-08-16 20:15:02

标签: python machine-learning scikit-learn

我有一个VotingClassifier,由200个单独的SVM分类器组成。默认情况下,此分类器使用多数规则投票。我想设置一个自定义阈值-仅在60%或更多的SVM分类器相同时才进行分类。

如果59%的SVM分类器具有相同的分类,则我不希望集成模型进行分类。

我没有看到为# [[1]] # [1] "a" # [[2]] # [1] "a" "b" # [[3]] # [1] "a" "b" # [[4]] # [1] "b" # [[5]] # [1] "b" 对象执行此操作的参数,但是我认为它必须在scikit-learn中的某处可行。我应该使用其他合奏类吗?

1 个答案:

答案 0 :(得分:1)

根据您在页面末尾获得的方法,最简单的解决方案是使用transform方法:

def transform(self, X):
        """Return class labels or probabilities for X for each estimator.
        Parameters
        ----------
        X : {array-like, sparse matrix}, shape = [n_samples, n_features]
            Training vectors, where n_samples is the number of samples and
            n_features is the number of features.
        Returns
        -------
        If `voting='soft'` and `flatten_transform=True`:
          array-like = (n_classifiers, n_samples * n_classes)
          otherwise array-like = (n_classifiers, n_samples, n_classes)
            Class probabilities calculated by each classifier.
        If `voting='hard'`:
          array-like = [n_samples, n_classifiers]
            Class labels predicted by each classifier.
        """

只需执行一个简单的函数即可获得一行的总和除以SVM的数量,然后应用您的阈值:

if(ratio>threshold):
     return 1
elif(ratio<(1-threshold)):
     return 0
else:
     #we don't make the prediction
     return -1