我正在尝试让我的代码工作,它曾经没有错误,直到我改变了我的数据中的一些东西,现在它完全没有输出。 似乎预测器预测了我发现奇怪的nan,因为没有输入值是nan的。当我在5000个数据集的样本上运行xgb.train时(有超过300000次观察),会引发此错误。当我在较小的数据集样本上运行它时,不会发生此错误。
我跑的代码:
Statadata= pd.read_stata('figtemp.dta')
Statadata = Statadata.drop(Statadata[(Statadata['periodf'] == 3) | (Statadata['periodf'] == 4)].index)
Statadata = Statadata.drop(Statadata[(Statadata['periods'] == 3) | (Statadata['periods'] == 4)].index)
Statadata.drop(Statadata[Statadata['zcstscoreela'].isnull()].index, inplace=True)
Statadata.drop(Statadata[Statadata['zcstscoremath'].isnull()].index, inplace=True)
eng = Statadata[Statadata['department']=='english']
eng = eng.drop(eng[eng['zcstscoreelaprior'].isnull()].index)
math = Statadata[Statadata['department']=='math']
math = math.drop(math[math['zcstscoremathprior'].isnull()].index)
y_en_gpa = eng['gpatotal']
y_en_cst = eng['zcstscoreela']
X_en = eng.copy()
del X_en['gpatotal']
del X_en['zcstscoremath']
del X_en['zcstscoreela']
del X_en['pareduccode']
del X_en['cstscoreela']
del X_en['cstscoremath']
y_math_gpa = math['gpatotal']
y_math_cst = math['zcstscoremath']
X_math = math.copy()
del X_math['gpatotal']
del X_math['zcstscoremath']
del X_math['zcstscoreela']
del X_math['pareduccode']
del X_math['cstscoreela']
del X_math['cstscoremath']
# english:
# deleting the columns and rows with missing values:
missing_en=X_en.isnull().sum()
missingbool_en=missing_en<25
selected_en=X_en.columns[missingbool_en]
selected_en=X_en[selected_en]
selected_en=selected_en.dropna(0)
y_en_cst=y_en_cst[selected_en.index]
y_en_gpa=y_en_gpa[selected_en.index]
# math:
# deleting the columns and rows with missing values:
missing_math=X_math.isnull().sum()
missingbool_math=missing_math<25
selected_math=X_math.columns[missingbool_math]
selected_math=X_math[selected_math]
selected_math=selected_math.dropna(0)
y_math_cst=y_math_cst[selected_math.index]
y_math_gpa=y_math_gpa[selected_math.index]
columns_to_overwrite = ['department', 'crsnamef', 'markf', 'crsnames', 'marks', 'cstlevelela', 'cstlevelmath', 'status', 'grade', 'gpaavg']
columns_to_overwrite2 = [ 'markf', 'crsnames', 'marks', 'cstlevelela', 'cstlevelmath', 'status', 'grade']
new_en=pd.get_dummies(selected_en['crsnamef'])
for i in columns_to_overwrite2:
nieuw_en=pd.get_dummies(selected_en[i])
new_en=new_en.merge(nieuw_en, left_index=True, right_index=True, suffixes=['_1','_2'])
selected_en=selected_en.drop(labels=columns_to_overwrite, axis="columns")
selected_en=new_en.merge(selected_en,left_index=True, right_index=True)
# math:
# Creating the dummy variables for the categorical string variables
new_math=pd.get_dummies(selected_math['crsnamef'])
for i in columns_to_overwrite2:
nieuw_math=pd.get_dummies(selected_math[i])
new_math=new_math.merge(nieuw_math, left_index=True, right_index=True, suffixes=['_1','_2'])
selected_math=selected_math.drop(labels=columns_to_overwrite, axis="columns")
selected_math=new_math.merge(selected_math,left_index=True, right_index=True)
X_train_math_gpa, X_test_math_gpa, y_train_math_gpa, y_test_math_gpa = train_test_split(selected_math, y_math_gpa, random_state=4)
X_train_math_cst, X_test_math_cst, y_train_math_cst, y_test_math_cst = train_test_split(selected_math, y_math_cst, random_state=4)
paramstest2 = {
'max_depth': 8,
'min_child_weight': 3,
'gamma': 0.4,
'subsample': 0.7,
'colsample_bytree': 0.7,
}
data_train = xgb.DMatrix(X_train_math_gpa, label=y_train_math_gpa)
data_test = xgb.DMatrix(X_test_math_gpa, label=y_test_math_gpa)
model=xgb.train(paramstest2, data_train, 5000, evals=[(data_test, "test")], verbose_eval=100, early_stopping_rounds=50)
我得到的错误是:
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 26 extra nodes, 6 pruned nodes, max_depth=6
[0] test-rmse:nan
Will train until test-rmse hasn't improved in 50 rounds.
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 66 extra nodes, 2 pruned nodes, max_depth=8
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 36 extra nodes, 46 pruned nodes, max_depth=8
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 16 extra nodes, 44 pruned nodes, max_depth=6
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 24 extra nodes, 92 pruned nodes, max_depth=7
[13:24:16] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 20 extra nodes, 80 pruned nodes, max_depth=7
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 50 pruned nodes, max_depth=4
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 12 extra nodes, 92 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 10 extra nodes, 102 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 14 extra nodes, 112 pruned nodes, max_depth=5
[13:24:17] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6
...
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 170 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 206 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 160 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 178 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 6 extra nodes, 142 pruned nodes, max_depth=3
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 154 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 188 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 150 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 160 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 166 pruned nodes, max_depth=0
[13:24:18] src/tree/updater_prune.cc:74: tree pruning end, 1 roots, 0 extra nodes, 182 pruned nodes, max_depth=0
Traceback (most recent call last):
File "<input>", line 1, in <module>
File "/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/training.py", line 204, in train
xgb_model=xgb_model, callbacks=callbacks)
File "/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/training.py", line 99, in _train_internal
evaluation_result_list=evaluation_result_list))
File "/Users/catlinbruys/PycharmProjects/Bachelor_Thesis/venv/lib/python3.6/site-packages/xgboost/callback.py", line 247, in callback
best_msg = state['best_msg']
KeyError: 'best_msg'
我该怎么做才能解决这个问题? 我真的需要一个解决方案,因为它是一个非常重要的项目。 感谢