以下是我进行超参数调整之前的代码。我使用约束拆分将数据泄漏到测试和训练中:
class_label=repair['PART_NO']
x=repair.drop('PART_NO',1)
X_train, X_test, y_train, y_test=cross_validation.train_test_split(x,class_label, train_size = 0.80)
def modelfit(alg, X_train, y_train ,useTrainCV=True, cv_folds=5,early_stopping_rounds=50):
if useTrainCV:
xgb_param = alg.get_xgb_params()
xgtrain = xgb.DMatrix(X_train, label=y_train)
extra = {'num_class': 2182}
xgb_param.update(extra)
cvresult = xgb.cv(xgb_param,
xgtrain,
num_boost_round=alg.get_params()['n_estimators'],
nfold=cv_folds,
stratified=True,
metrics={'merror'},
early_stopping_rounds=early_stopping_rounds,
seed=0,
callbacks=[xgb.callback.print_evaluation(show_stdv=False)]),
print cvresult
alg.set_params(n_estimators=cvresult.shape[0])
#Fit the algorithm on the data
alg.fit(X_train, y_train,eval_metric='merror')
#Predict training set:
dtrain_predictions = alg.predict(X_test)
dtrain_predprob = alg.predict_proba(X_test)
#Print model report:
print "\nModel Report"
print "Accuracy : %.4g" % metrics.accuracy_score( dtrain_predictions,y_test)
print "Merror Score (Train): %f" % metrics.merror_score( dtrain_predprob,y_test)
feat_imp = pd.Series(alg.booster().get_fscore()).sort_values(ascending=False)
feat_imp.plot(kind='bar', title='Feature Importances')
plt.ylabel('Feature Importance Score')
在此之后,我尝试选择除目标以外的所有预测变量,并获得如下的估计量:
xgb1 = XGBClassifier(
learning_rate =0.1,
n_estimators=280,
max_depth=5,
min_child_weight=1,
gamma=0,
subsample=0.8,
colsample_bytree=0.8,
objective= 'multi:softprob',
nthread=4,
scale_pos_weight=1,
seed=27)
modelfit(xgb1, X_train, y_train)
但是,即使在xgb_params中提到了num_class且y_train的类型为int,我仍然遇到以下错误。 请建议该怎么办?确切的错误如下:
-------------------------------------------------- ---------------------------- XGBoostError错误回溯(最近一次调用 最后)在() 12 scale_pos_weight = 1, 13种子= 27) ---> 14 modelfit(xgb1,X_train,y_train)
在modelfit(alg,X_train,y_train, useTrainCV,cv_folds,early_stopping_rounds) 14个Early_stopping_rounds = early_stopping_rounds, 15种子= 0, ---> 16个回调= [xgb.callback.print_evaluation(show_stdv = False)]), 17打印结果 18 alg.set_params(n_estimators = cvresult.shape [0])
/用户/sayontimondal/anaconda2/lib/python2.7/site-packages/xgboost/training.pyc 在cv中(params,dtrain,num_boost_round,nfold,分层,folds, 指标,obj,feval,最大化,early_stopping_rounds,fpreproc, as_pandas,verbose_eval,show_stdv,种子,回调,随机播放) 404 Evaluation_result_list = None)) 405倍的cvfolds: -> 406 fold.update(i,obj) 407 res = aggcv([cvfold中f的f.eval(i,feval)]) 408
/用户/sayontimondal/anaconda2/lib/python2.7/site-packages/xgboost/training.pyc 在更新(自我,迭代,fobj) 216 def更新(自身,迭代,fobj): 217“”“”将助推器更新一个迭代“”“ -> 218 self.bst.update(self.dtrain,迭代,fobj) 219 220 def eval(自我,迭代,节日):
/用户/sayontimondal/anaconda2/lib/python2.7/site-packages/xgboost/core.pyc 在更新中(self,dtrain,迭代,fobj) 892如果fobj为None: (893) -> 894 dtrain.handle)) 第895章 896 pred = self.predict(dtrain)
/用户/sayontimondal/anaconda2/lib/python2.7/site-packages/xgboost/core.pyc 在_check_call(ret)中 128“”“ 129如果ret!= 0: -> 130引发XGBoostError(_LIB.XGBGetLastError()) 131 132
XGBoostError:[13:34:08] src / objective / multiclass_obj.cc:78:检查 失败:label_error> = 0 && label_error
堆栈跟踪返回了7个条目:[bt](0)0 libxgboost.dylib
0x000000010d0684a0 dmlc :: StackTrace()+ 288 [bt](1)1
libxgboost.dylib 0x000000010d06823f dmlc :: LogMessageFatal ::〜LogMessageFatal()+ 47 [bt](2)2
libxgboost.dylib 0x000000010d0dcf9a xgboost :: obj :: SoftmaxMultiClassObj :: GetGradient(xgboost :: HostDeviceVector *, xgboost :: MetaInfo const&,int, xgboost :: HostDeviceVector)+ 2218 [bt](3)3 libxgboost.dylib 0x000000010d0645f9 xgboost :: LearnerImpl :: UpdateOneIter(int, xgboost :: DMatrix )+ 1017 [bt](4)4 libxgboost.dylib
0x000000010d07ef07 XGBoosterUpdateOneIter + 87 [bt](5)5 _ctypes.so 0x0000000103528677 ffi_call_unix64 + 79 [bt](6)6 ???
0x00007ffeefbfa980 0x0 + 140732920736128
在Google上搜索它没有结果。
答案 0 :(得分:0)
您的标签应从0开始到课程总数-1。 例如-如果您的类标签为(1,2,3,4,5)。为了将其提供给multi:softprob目标,您需要将其转换为(0,1,2,3,4)类。 可以使用 y.replace({1:0,2:1,3:2,4:3,5:4},inplace = True)
完成