Question

有人可以告诉我如何使用部分适合在sklearn中使用合奏。我不想重新训练我的模型。或者，我们可以通过预先训练的模型进行整合吗？我已经看到投票分类器例如不支持使用部分拟合进行训练。

Answer 1

Mlxtend库具有VotingEnsemble的实现，允许您传入预先安装的模型。例如，如果您有三个预先训练的模型clf1，clf2，clf3。以下代码可以使用。

from mlxtend.classifier import EnsembleVoteClassifier
import copy
eclf = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3], weights=[1,1,1], refit=False)

当设置为false时，EnsembleVoteClassifier中的 refit 参数可确保分类器不会重新编译。

一般而言，在寻找sci-kit学习不提供的更高级技术特性时，请将mlxtend视为第一个参考点。

Answer 2

不幸的是，目前在scikit VotingClassifier中这是不可能的。

但您可以使用http://sebastianraschka.com/Articles/2014_ensemble_classifier.html（从中实施VotingClassifer ）来尝试实施您自己的投票分类器，该分类器可以采用预先安装的模型。

我们也可以查看source code here并将其修改为我们的使用：

from sklearn.preprocessing import LabelEncoder
import numpy as np

le_ = LabelEncoder()

# When you do partial_fit, the first fit of any classifier requires 
all available labels (output classes), 
you should supply all same labels here in y.
le_.fit(y)

# Fill below list with fitted or partial fitted estimators
clf_list = [clf1, clf2, clf3, ... ]

# Fill weights -> array-like, shape = [n_classifiers] or None
weights = [clf1_wgt, clf2_wgt, ... ]
weights = None

#For hard voting:
pred = np.asarray([clf.predict(X) for clf in clf_list]).T
pred = np.apply_along_axis(lambda x:
                           np.argmax(np.bincount(x, weights=weights)),
                           axis=1,
                           arr=pred.astype('int'))

#For soft voting:
pred = np.asarray([clf.predict_proba(X) for clf in clf_list])
pred = np.average(pred, axis=0, weights=weights)
pred = np.argmax(pred, axis=1)

#Finally, reverse transform the labels for correct output:
pred = le_.inverse_transform(np.argmax(pred, axis=1))

Answer 3

Mlxtend库具有一个实现工作，您仍然需要为EnsembleVoteClassifier调用fit函数。似乎fit函数实际上并没有修改任何参数，而是检查可能的标签值。在下面的示例中，您必须给一个数组，该数组包含在eclf2.fit的原始y（在本例中为1,2）中出现的所有可能值，对于X而言无关紧要。

import numpy as np
from mlxtend.classifier import EnsembleVoteClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.naive_bayes import GaussianNB
import copy
clf1 = LogisticRegression(random_state=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

for clf in (clf1, clf2, clf3):
    clf.fit(X, y)    
eclf2 = EnsembleVoteClassifier(clfs=[clf1, clf2, clf3],voting="soft",refit=False)
eclf2.fit(None,np.array([1,2]))
print(eclf2.predict(X))

Answer 4

解决方法：

VotingClassifier检查是否已设置estimators_以便了解其是否适合，并且正在使用estimators_列表中的estimators进行预测。如果您有训练有素的分类器，则可以像下面的代码一样直接将它们放入estimators _。

但是，它也使用LabelEnconder，因此它假定标签像0,1,2，...，并且还需要设置le_和classes_（请参见下文）。

from sklearn.ensemble import VotingClassifier
from sklearn.preprocessing import LabelEncoder

clf_list = [clf1, clf2, clf3]

eclf = VotingClassifier(estimators = [('1' ,clf1), ('2', clf2), ('3', clf3)], voting='soft')

eclf.estimators_ = clf_list
eclf.le_ = LabelEncoder().fit(y)
eclf.classes_ = seclf.le_.classes_

# Now it will work without calling fit
eclf.predict(X,y)

使用sklearn投票合奏与部分合身

4 个答案: