Question

官方文件似乎没有提供信息。

我想知道为什么我们不能为已经训练的模型提供VotingClassifier，所以我们不需要再训练，因为VotingClassifier要求我们在预测之前调用fit方法。

它只是这样做：

for clf in self.clfs:
    clf.fit(X, y)

还是使用了一些更有趣的折叠方法？

Answer 1

这是VotingClassifier.fit的作用：

def fit(self, X, y, sample_weight=None):
    ...  # Validates the arguments, estimators, etc.

    self.le_ = LabelEncoder()
    self.le_.fit(y)
    self.classes_ = self.le_.classes_
    self.estimators_ = []

    transformed_y = self.le_.transform(y)

    self.estimators_ = Parallel(n_jobs=self.n_jobs)(
            delayed(_parallel_fit_estimator)(clone(clf), X, transformed_y,
                sample_weight)
                for _, clf in self.estimators)

    return self

...其中_parallel_fit_estimator只是estimator.fit调用的包装器：

def _parallel_fit_estimator(estimator, X, y, sample_weight):
    if sample_weight is not None:
        estimator.fit(X, y, sample_weight)
    else:
        estimator.fit(X, y)
    return estimator

如您所见，该方法确实适合分类器（并行！）并创建标签编码器self.le_和self.estimators_属性。 predict()或transform()方法建立在这些属性之上，这就是为什么首先调用fit()的原因。

sklearn VotingClassifier适合使用什么方法？

1 个答案: