我正在使用CalibratedClassifierCV
才能在predict_proba
上使用LinearSVC
。现在,我想在管道中添加一个自定义分类器,该分类器将所有概率低于10%的预测分配给“其他”类别。其余的预测类别应按原样进行。
from sklearn.calibration import CalibratedClassifierCV
from custom_model import RuleBasedClassifier
pipeline_clf = Pipeline([
("MLClassifier", CalibratedClassifierCV(LinearSVC(C=0.6))),
("RuleBasedClassifier", RuleBasedClassifier())
])
但是,如果我尝试向sklearn管道添加自定义分类器,则会收到以下错误消息:
TypeError: All intermediate steps should be transformers and implement
fit and transform or be the string 'passthrough'
'CalibratedClassifierCV' doesn't
我不知道如何更改CalibratedClassifierCV
以将结果传递给自定义分类器。
答案 0 :(得分:1)
您可以构建一个自定义的变形器,如下所示。
from sklearn.base import BaseEstimator, TransformerMixin
class ExtractProbsFromClassifier(BaseEstimator, TransformerMixin):
def __init__(self, clf):
self.clf = clf
self.feature_names_ = None
def fit(self, X, y):
self.clf.fit(X, y)
return self
def transform(self, X):
return self.clf.predict_proba(X)
def get_feature_names(self):
if self.feature_names_ is None:
prefix = np.full(len(self.clf.classes_), fill_value="Prob_", dtype="<U5")
self.feature_names_ = np.core.defchararray.add(prefix, self.clf.classes_).tolist()
return self.feature_names_
这可以在这样的管道中使用:
from sklearn.pipeline import Pipeline
pipeline = Pipeline([
("ExtractProbs", ExtractProbsFromClassifier(clf=CalibratedClassifierCV(LinearSVC(C=0.6))),
("RuleBasedClassifier", RuleBasedClassifier())
], verbose=True)