如何在CalibratedClassifierCV之后向sklearn管道添加自定义RuleBasedClassifier?

时间:2019-07-29 08:59:02

标签: python scikit-learn

我正在使用CalibratedClassifierCV才能在predict_proba上使用LinearSVC。现在,我想在管道中添加一个自定义分类器,该分类器将所有概率低于10%的预测分配给“其他”类别。其余的预测类别应按原样进行。

from sklearn.calibration import CalibratedClassifierCV
from custom_model import RuleBasedClassifier

pipeline_clf = Pipeline([
    ("MLClassifier", CalibratedClassifierCV(LinearSVC(C=0.6))),
    ("RuleBasedClassifier", RuleBasedClassifier())
])

但是,如果我尝试向sklearn管道添加自定义分类器,则会收到以下错误消息:

   TypeError: All intermediate steps should be transformers and implement 
   fit and transform or be the string 'passthrough' 
   'CalibratedClassifierCV' doesn't

我不知道如何更改CalibratedClassifierCV以将结果传递给自定义分类器。

1 个答案:

答案 0 :(得分:1)

您可以构建一个自定义的变形器,如下所示。

from sklearn.base import BaseEstimator, TransformerMixin

class ExtractProbsFromClassifier(BaseEstimator, TransformerMixin):
    def __init__(self, clf):
        self.clf = clf
        self.feature_names_ = None

    def fit(self, X, y):
        self.clf.fit(X, y)
        return self

    def transform(self, X):
        return self.clf.predict_proba(X)

    def get_feature_names(self):
        if self.feature_names_ is None:
            prefix = np.full(len(self.clf.classes_), fill_value="Prob_", dtype="<U5")
            self.feature_names_ = np.core.defchararray.add(prefix, self.clf.classes_).tolist()
        return self.feature_names_

这可以在这样的管道中使用:

from sklearn.pipeline import Pipeline

pipeline = Pipeline([
  ("ExtractProbs", ExtractProbsFromClassifier(clf=CalibratedClassifierCV(LinearSVC(C=0.6))),
  ("RuleBasedClassifier", RuleBasedClassifier())
], verbose=True)