如何在Sklearn中将fit_params用于带有VotingClassifier的RandomizedSearch?

时间:2016-02-22 04:15:59

标签: machine-learning scikit-learn classification grid-search

您好我正在尝试使用fit_params(对于GradientBoostingClassifier上的sample_weight)对于Sklearn中的VotingClassifier进行RandomizedSearch,因为数据集是不平衡的。有人可以给我建议和可能的代码示例吗?

我当前没有工作的代码如下:

random_search = RandomizedSearchCV(my_votingClassifier, param_distributions=param_dist,
                                   n_iter=n_iter_search, n_jobs=-1, fit_params={'sample_weight':y_np_array})

错误:

TypeError: fit() got an unexpected keyword argument 'sample_weight'

1 个答案:

答案 0 :(得分:3)

考虑到似乎不是通过sample_weight传递VotingClassifier参数的直接方式,我遇到了这个小小的" hack":

覆盖底部分类器的fit方法。例如,如果您使用DecisionTreeClassifier,则可以通过传递所需的fit参数来覆盖其sample_weight方法。

class MyDecisionTreeClassifier(DecisionTreeClassifier):
    def fit(self, X , y = None):
        return super(DecisionTreeClassifier, self).fit(X,y,sample_weight=y)

现在,您VotingClassifier中的分类符集合可以使用您自己的MyDecisionTreeClassifier

完整的工作示例:

import numpy as np
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import VotingClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.grid_search import RandomizedSearchCV

X = np.array([[-1, -1], [-2, -1], [-3, -2], [1, 1], [2, 1], [3, 2]])
y = np.array([1, 1, 1, 2, 2, 2])

class MyDecisionTreeClassifier(DecisionTreeClassifier):
    def fit(self, X , y = None):
        return super(DecisionTreeClassifier, self).fit(X,y,sample_weight=y)

clf1 = MyDecisionTreeClassifier()
clf2 = RandomForestClassifier() 
params = {'dt__max_depth': [5, 10],'dt__max_features':[1,2]} 
eclf = VotingClassifier(estimators=[('dt', clf1), ('rf', clf2)], voting='hard')
random_search = RandomizedSearchCV(eclf, param_distributions=params,n_iter=4)
random_search.fit(X, y)
print(random_search.grid_scores_)
print(random_search.best_score_)