如何使用其他估算器构成sklearn估算器?

时间:2018-11-29 18:38:42

标签: machine-learning scikit-learn

我想训练LogisticRegressionRandomForestClassifier,并使用GaussianNB组合他们的分数:

from sklearn.datasets import make_classification
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

X, y = make_classification(n_samples=1000, n_features=4,
                           n_informative=2, n_redundant=0,
                           random_state=0, shuffle=False)

logit = LogisticRegression(random_state=0)
logit.fit(X, y)

randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0)
randf.fit(X, y)

X1 = np.transpose([logit.predict_proba(X)[:,0], randf.predict_proba(X)[:,0]])

nb = GaussianNB()
nb.fit(X1, y)

我该如何使用Pipeline来将其传递给cross_validateGridSearchCV

PS。我想我可以定义自己的类来实现fitpredict_proba方法,但是我认为应该有一种标准的方法...

1 个答案:

答案 0 :(得分:1)

否,无需编写一些自定义代码,sklearn中就不会内置任何功能来执行您想要的操作。您可以使用FeatureUnion并行化代码的某些部分,并使用Pipeline对整个任务进行排序,但是您需要编写可将predict_proba的输出转发到{{1}的自定义转换器。 } 方法。

类似这样的东西:

transform

现在您可以简单地调用from sklearn.datasets import make_classification from sklearn.linear_model import LogisticRegression from sklearn.ensemble import RandomForestClassifier from sklearn.naive_bayes import GaussianNB from sklearn.base import BaseEstimator, TransformerMixin from sklearn.pipeline import Pipeline, FeatureUnion X, y = make_classification(n_samples=1000, n_features=4, n_informative=2, n_redundant=0, random_state=0, shuffle=False) # This is the custom transformer that will convert # predict_proba() to pipeline friendly transform() class PredictProbaTransformer(BaseEstimator, TransformerMixin): def __init__(self, clf=None): self.clf = clf def fit(self, X, y): if self.clf is not None: self.clf.fit(X, y) return self def transform(self, X): if self.clf is not None: # Drop the 2nd column but keep 2d shape # because FeatureUnion wants that return self.clf.predict_proba(X)[:,[0]] return X # This method is important for correct working of pipeline def fit_transform(self, X, y): return self.fit(X, y).transform(X) logit = LogisticRegression(random_state=0) randf = RandomForestClassifier(n_estimators=100, max_depth=2, random_state=0) pipe = Pipeline([ ('stack',FeatureUnion([ ('logit', PredictProbaTransformer(logit)), ('randf', PredictProbaTransformer(randf)), #You can add more classifiers with custom wrapper like above ])), ('nb',GaussianNB())]) pipe.fit(X, y) ,所有事情都将正确完成。

有关FeatureUnion的更多信息,您可以在这里查看我对类似问题的其他答案:-