如何使用XGBoost softprob多类分类,这样我就不会得到num_class的错误?

时间:2018-10-03 01:01:48

标签: python xgboost sklearn-pandas

以下是我进行超参数调整之前的代码。我使用约束拆分将数据泄漏到测试和训练中:

class_label=repair['PART_NO']
x=repair.drop('PART_NO',1)

X_train, X_test, y_train, y_test=cross_validation.train_test_split(x,class_label, train_size = 0.80)

def modelfit(alg, X_train, y_train ,useTrainCV=True, cv_folds=5,early_stopping_rounds=50):

if useTrainCV:
    xgb_param = alg.get_xgb_params()
    xgtrain = xgb.DMatrix(X_train, label=y_train)
    extra = {'num_class': 2182}
    xgb_param.update(extra)
    cvresult = xgb.cv(xgb_param, 
                      xgtrain, 
                      num_boost_round=alg.get_params()['n_estimators'], 
                      nfold=cv_folds,
                      stratified=True,
                      metrics={'merror'},
                      early_stopping_rounds=early_stopping_rounds,
                      seed=0,
                      callbacks=[xgb.callback.print_evaluation(show_stdv=False)]),
    print cvresult
    alg.set_params(n_estimators=cvresult.shape[0])


#Fit the algorithm on the data
alg.fit(X_train, y_train,eval_metric='merror')

#Predict training set:
dtrain_predictions = alg.predict(X_test)
dtrain_predprob = alg.predict_proba(X_test)

#Print model report:
print "\nModel Report"
print "Accuracy : %.4g" % metrics.accuracy_score( dtrain_predictions,y_test)
print "Merror Score (Train): %f" % metrics.merror_score( dtrain_predprob,y_test)

feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')

在此之后,我尝试选择除目标以外的所有预测变量,并获得如下的估计量:

xgb1 = XGBClassifier(
learning_rate =0.1,
n_estimators=280,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8, 
colsample_bytree=0.8,
objective= 'multi:softprob',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb1, X_train, y_train)

但是,即使在xgb_params中提到了num_class且y_train的类型为int,我仍然遇到以下错误。 请建议该怎么办?确切的错误如下:

  

-------------------------------------------------- ---------------------------- XGBoostError错误回溯(最近一次调用   最后)在()        12 scale_pos_weight = 1,        13种子= 27)   ---> 14 modelfit(xgb1,X_train,y_train)

     

在modelfit(alg,X_train,y_train,   useTrainCV,cv_folds,early_stopping_rounds)        14个Early_stopping_rounds = early_stopping_rounds,        15种子= 0,   ---> 16个回调= [xgb.callback.print_evaluation(show_stdv = False)]),        17打印结果        18 alg.set_params(n_estimators = cvresult.shape [0])

     

/用户/sayon​​timondal/anaconda2/lib/python2.7/site-packages/xgboost/training.pyc   在cv中(params,dtrain,num_boost_round,nfold,分层,folds,   指标,obj,feval,最大化,early_stopping_rounds,fpreproc,   as_pandas,verbose_eval,show_stdv,种子,回调,随机播放)       404 Evaluation_result_list = None))       405倍的cvfolds:   -> 406 fold.update(i,obj)       407 res = aggcv([cvfold中f的f.eval(i,feval)])       408

     

/用户/sayon​​timondal/anaconda2/lib/python2.7/site-packages/xgboost/training.pyc   在更新(自我,迭代,fobj)       216 def更新(自身,迭代,fobj):       217“”“”将助推器更新一个迭代“”“   -> 218 self.bst.update(self.dtrain,迭代,fobj)       219       220 def eval(自我,迭代,节日):

     

/用户/sayon​​timondal/anaconda2/lib/python2.7/site-packages/xgboost/core.pyc   在更新中(self,dtrain,迭代,fobj)       892如果fobj为None:       (893)   -> 894 dtrain.handle))       第895章       896 pred = self.predict(dtrain)

     

/用户/sayon​​timondal/anaconda2/lib/python2.7/site-packages/xgboost/core.pyc   在_check_call(ret)中       128“”“       129如果ret!= 0:   -> 130引发XGBoostError(_LIB.XGBGetLastError())       131       132

     

XGBoostError:[13:34:08] src / objective / multiclass_obj.cc:78:检查   失败:label_error> = 0 && label_error      

堆栈跟踪返回了7个条目:[bt](0)0 libxgboost.dylib
  0x000000010d0684a0 dmlc :: StackTrace()+ 288 [bt](1)1
  libxgboost.dylib 0x000000010d06823f   dmlc :: LogMessageFatal ::〜LogMessageFatal()+ 47 [bt](2)2
  libxgboost.dylib 0x000000010d0dcf9a   xgboost :: obj :: SoftmaxMultiClassObj :: GetGradient(xgboost :: HostDeviceVector *,   xgboost :: MetaInfo const&,int,   xgboost :: HostDeviceVector

     
    

)+ 2218 [bt](3)3 libxgboost.dylib 0x000000010d0645f9 xgboost :: LearnerImpl :: UpdateOneIter(int,     xgboost :: DMatrix )+ 1017 [bt](4)4 libxgboost.dylib
    0x000000010d07ef07 XGBoosterUpdateOneIter + 87 [bt](5)5 _ctypes.so     0x0000000103528677 ffi_call_unix64 + 79 [bt](6)6 ???
    0x00007ffeefbfa980 0x0 + 140732920736128

  

在Google上搜索它没有结果。

1 个答案:

答案 0 :(得分:0)

您的标签应从0开始到课程总数-1。 例如-如果您的类标签为(1,2,3,4,5)。为了将其提供给multi:softprob目标,您需要将其转换为(0,1,2,3,4)类。 可以使用 y.replace({1:0,2:1,3:2,4:3,5:4},inplace = True)

完成