为什么我无法从子类访问XGBClassifier feature_importances_?

时间:2016-04-25 14:53:22

标签: python oop scikit-learn xgboost

我一直在克服XGBClassifier的这种奇怪行为,这应该像RandomForestClassifier一样表现得很好:

import xgboost as xgb 
from sklearn.ensemble import RandomForestClassifier

class my_rf(RandomForestClassifier):
    def important_features(self, X):
        return super(RandomForestClassifier, self).feature_importances_         

class my_xgb(xgb.XGBClassifier):
    def important_features(self, X):
        return super(xgb.XGBClassifier, self).feature_importances_          

c1 = my_rf()
c1.fit(X,y)
c1.important_features(X) #works

此代码失败:(

c2 = my_xgb()
c2.fit(X,y)
c2.important_features(X) #fails with AttributeError: 'super' object has no attribute 'feature_importances_'

我盯着两个代码位,它们看起来和我一样!我错过了什么? 对不起,如果这是noob,python OOP的奥秘就在我身边。

rf-code

xgb-code

编辑:

如果我使用vanilla xgb,没有继承,一切都很好用:

import xgboost as xgb
print "version:", xgb.__version__
c = xgb.XGBClassifier()
c.fit(X_train.as_matrix(), y_train.label)
print c.feature_importances_[:5]            

version: 0.4
[ 0.4039548   0.05932203  0.06779661  0.00847458  0.        ]

2 个答案:

答案 0 :(得分:1)

据我所知,feature_importances_未在XGBoost中实施。你可以使用像排列特征重要性这样的东西来推动你自己:

import random
from sklearn.cross_validation import cross_val_score

def feature_importances(clf, X, y):
    score = np.mean(cross_val_score(clf, X,y,scoring='roc_auc'))
    importances = {} 
    for i in range(X.shape[1]):
        X_perm = X.copy()
        X_perm[:,i] = random.sample(X[:,i].tolist(), X.shape[0])
        perm_score = np.mean(cross_val_score(clf, X_perm , y, scoring='roc_auc'))
        importances[i] = score - perm_score

    return importances

答案 1 :(得分:1)

输出显示代码版本为0.4,repository tree of last stable version of 0.4x(已发布Jan 15, 2016)显示sklearn.py文件尚未feature_importances_。 此功能实际上是在Feb 8, 2016上的this提交中引入的。

我克隆了当前的github存储库,从头开始构建并安装了xgboost,代码工作正常:

from sklearn import datasets
from sklearn.ensemble.forest import RandomForestClassifier
import xgboost as xgb
print "version:", xgb.__version__

class my_rf(RandomForestClassifier):
    def important_features(self, X):
        return super(RandomForestClassifier, self).feature_importances_ 

class my_xgb(xgb.XGBClassifier):
    def important_features(self, X):
        return super(xgb.XGBClassifier, self).feature_importances_

iris = datasets.load_iris()
X = iris.data
y = iris.target

c1 = my_rf()
c1.fit(X,y)
print c1.important_features(X)

c2 = my_xgb()
c2.fit(X,y)
print c2.important_features(X)

c3 = xgb.XGBClassifier()
c3.fit(X, y)
print c3.feature_importances_

输出:

version: 0.6
[ 0.11834481  0.02627218  0.57008797  0.28529505]
[ 0.17701453  0.11228534  0.41479525  0.29590487]
[ 0.17701453  0.11228534  0.41479525  0.29590487]

编辑:

如果您正在使用XGBRegressor,请确保在Dec 1, 2016之后克隆了存储库,因为根据this提交,feature_importances_移动到base XGBModelXGBRegressor访问class my_xgb_regressor(xgb.XGBRegressor): def important_features(self, X): return super(xgb.XGBRegressor, self).feature_importances_ c4 = my_xgb_regressor() c4.fit(X, y) print c4.important_features(X)

将此添加到上面的代码中:

version: 0.6
[ 0.0307026   0.01456868  0.45198349  0.50274523]
[ 0.17701453  0.11228534  0.41479525  0.29590487]
[ 0.17701453  0.11228534  0.41479525  0.29590487]
[ 0.25        0.17518248  0.34489051  0.229927  ]

输出:

export default function clientMiddleware(client) {
  return ({ dispatch, getState }) => {
    return next => (action) => {
      if (typeof action === 'function') {
        return action(dispatch, getState);
      }

      const { promise, ...rest } = action;
      if (!promise) {
        return next(action);
      }

      next({ ...rest });

      const actionPromise = promise(client);
      actionPromise.then(
        result => next({ ...rest, result }),
        error => next({ ...rest, error }),
      ).catch((error) => {
        console.error('MIDDLEWARE ERROR:', error);
        next({ ...rest, error });
      });

      return actionPromise;
    };
  };
}