我目前正在尝试网格搜索我的第一个 XGBoost 模型。我已经运行了三个网格搜索,在最后一个我收到以下错误“估计器 GridSearchCV 的参数学习率无效”(将进一步发布整个事情)。 Ny 数据框是单热编码的,我的 X 有 144 列和 150k 行,y 是多类。
第一个网格搜索:
[IN]:
n_estimators = range(50, 300, 50)
param_grid = dict(n_estimators=n_estimators)
kfold = StratifiedKFold(n_splits=4, shuffle=True)
grid_search_m = GridSearchCV(model_m, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)
grid_result_m = grid_search_m.fit(X, ym)
[IN]:
print("Best: %f using %s" % (grid_result_m.best_score_, grid_result_m.best_params_))
[OUT]:
Best: -0.969471 using {'n_estimators': 50}
[IN]:
model_m.get_params()
[OUT]:
{'mean_fit_time': array([201.849, 384.275, 584.447, 781.368, 682.574]),
'std_fit_time': array([1.088, 2.173, 2.087, 4.136, 2.503]),
'mean_score_time': array([0.444, 0.513, 0.562, 0.527, 0.519]),
'std_score_time': array([0.083, 0.132, 0.07 , 0.063, 0.045]),
'param_n_estimators': masked_array(data=[50, 100, 150, 200, 250],
mask=[False, False, False, False, False],
fill_value='?',
dtype=object),
'params': [{'n_estimators': 50},
{'n_estimators': 100},
{'n_estimators': 150},
{'n_estimators': 200},
{'n_estimators': 250}],
'split0_test_score': array([-0.97 , -0.974, -0.978, -0.982, -0.987]),
'split1_test_score': array([-0.97 , -0.974, -0.979, -0.984, -0.989]),
'split2_test_score': array([-0.972, -0.975, -0.98 , -0.984, -0.989]),
'split3_test_score': array([-0.966, -0.967, -0.971, -0.975, -0.98 ]),
'mean_test_score': array([-0.969, -0.973, -0.977, -0.981, -0.986]),
'std_test_score': array([0.002, 0.003, 0.004, 0.004, 0.003]),
'rank_test_score': array([1, 2, 3, 4, 5])}
然后我使用 joblib 保存模型,
第二个网格搜索:
[IN]:
n_estimators = [50, 100, 150, 200]
max_depth = [2, 4, 6, 8]
param_grid = dict(max_depth=max_depth, n_estimators=n_estimators)
kfold = StratifiedKFold(n_splits=4, shuffle=True, random_state=7)
grid_search_m = GridSearchCV(model_m, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold, verbose=1)
grid_result_m = grid_search_m.fit(X, ym)
[OUT]:
print("Best: %f using %s" % (grid_result_m.best_score_, grid_result_m.best_params_))
Best: -0.968771 using {'max_depth': 4, 'n_estimators': 100}
[IN]:
model_m.get_params()
[OUT]:
{'mean_fit_time': array([ 62.35 , 142.499, 222.805, 252.453, 134.58 , 261.158, 391.637,
414.209, 219.397, 319.109, 467.584, 560.963, 287.38 , 470.255,
667.015, 604.39 ]),
'std_fit_time': array([0.456, 0.409, 0.457, 0.793, 0.603, 1.069, 2.226, 1.199, 1.439,
2.719, 2.017, 2.201, 2.772, 4.188, 6.421, 3.708]),
'mean_score_time': array([0.203, 0.19 , 0.196, 0.372, 0.299, 0.413, 0.446, 0.323, 0.239,
0.287, 0.344, 0.476, 0.258, 0.325, 0.406, 0.426]),
'std_score_time': array([0.016, 0.011, 0.022, 0.018, 0.018, 0.053, 0.005, 0.056, 0.022,
0.032, 0.008, 0.059, 0.015, 0.014, 0.048, 0.091]),
'param_max_depth': masked_array(data=[2, 2, 2, 2, 4, 4, 4, 4, 6, 6, 6, 6, 8, 8, 8, 8],
mask=[False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False],
fill_value='?',
dtype=object),
'param_n_estimators': masked_array(data=[50, 100, 150, 200, 50, 100, 150, 200, 50, 100, 150,
200, 50, 100, 150, 200],
mask=[False, False, False, False, False, False, False, False,
False, False, False, False, False, False, False, False],
fill_value='?',
dtype=object),
'params': [{'max_depth': 2, 'n_estimators': 50},
{'max_depth': 2, 'n_estimators': 100},
{'max_depth': 2, 'n_estimators': 150},
{'max_depth': 2, 'n_estimators': 200},
{'max_depth': 4, 'n_estimators': 50},
{'max_depth': 4, 'n_estimators': 100},
{'max_depth': 4, 'n_estimators': 150},
{'max_depth': 4, 'n_estimators': 200},
{'max_depth': 6, 'n_estimators': 50},
{'max_depth': 6, 'n_estimators': 100},
{'max_depth': 6, 'n_estimators': 150},
{'max_depth': 6, 'n_estimators': 200},
{'max_depth': 8, 'n_estimators': 50},
{'max_depth': 8, 'n_estimators': 100},
{'max_depth': 8, 'n_estimators': 150},
{'max_depth': 8, 'n_estimators': 200}],
'split0_test_score': array([-0.98 , -0.973, -0.972, -0.971, -0.969, -0.968, -0.97 , -0.971,
-0.969, -0.971, -0.976, -0.98 , -0.971, -0.98 , -0.991, -1.002]),
'split1_test_score': array([-0.975, -0.968, -0.967, -0.966, -0.966, -0.964, -0.966, -0.967,
-0.965, -0.969, -0.973, -0.979, -0.969, -0.978, -0.988, -0.999]),
'split2_test_score': array([-0.983, -0.976, -0.974, -0.973, -0.973, -0.972, -0.973, -0.974,
-0.974, -0.977, -0.982, -0.986, -0.976, -0.984, -0.994, -1.005]),
'split3_test_score': array([-0.979, -0.972, -0.97 , -0.969, -0.97 , -0.97 , -0.972, -0.973,
-0.97 , -0.974, -0.98 , -0.985, -0.973, -0.982, -0.994, -1.005]),
'mean_test_score': array([-0.979, -0.972, -0.971, -0.97 , -0.97 , -0.969, -0.97 , -0.971,
-0.97 , -0.973, -0.978, -0.982, -0.972, -0.981, -0.992, -1.003]),
'std_test_score': array([0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003, 0.003,
0.003, 0.003, 0.003, 0.002, 0.002, 0.002, 0.002]),
'rank_test_score': array([12, 9, 6, 4, 3, 1, 5, 7, 2, 10, 11, 14, 8, 13, 15, 16])}
我再次保存了模型和第三个 gridsearch 出错的地方*
# Tune learning_rate and n_estimators
[IN]:
n_estimators = [50, 100, 150, 200]
learning_rate = [0.0001, 0.001, 0.01, 0.1]
param_grid = dict(learning_rate=learning_rate, n_estimators=n_estimators)
kfold = StratifiedKFold(n_splits=4, shuffle=True, random_state=7)
grid_search_m = GridSearchCV(model_m, param_grid, scoring="neg_log_loss", n_jobs=-1, cv=kfold)
grid_result_m = grid_search_m.fit(X, ym)
[OUT]:
---------------------------------------------------------------------------
_RemoteTraceback Traceback (most recent call last)
_RemoteTraceback:
"""
Traceback (most recent call last):
File "C:\Users\cyrra\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 431, in _process_worker
r = call_item()
File "C:\Users\cyrra\anaconda3\lib\site-packages\joblib\externals\loky\process_executor.py", line 285, in __call__
return self.fn(*self.args, **self.kwargs)
File "C:\Users\cyrra\anaconda3\lib\site-packages\joblib\_parallel_backends.py", line 595, in __call__
return self.func(*args, **kwargs)
File "C:\Users\cyrra\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in __call__
return [func(*args, **kwargs)
File "C:\Users\cyrra\anaconda3\lib\site-packages\joblib\parallel.py", line 262, in <listcomp>
return [func(*args, **kwargs)
File "C:\Users\cyrra\anaconda3\lib\site-packages\sklearn\utils\fixes.py", line 222, in __call__
return self.function(*args, **kwargs)
File "C:\Users\cyrra\anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 581, in _fit_and_score
estimator = estimator.set_params(**cloned_parameters)
File "C:\Users\cyrra\anaconda3\lib\site-packages\sklearn\base.py", line 230, in set_params
raise ValueError('Invalid parameter %s for estimator %s. '
ValueError: Invalid parameter learning_rate for estimator GridSearchCV(cv=StratifiedKFold(n_splits=4, random_state=7, shuffle=True),
estimator=XGBClassifier(base_score=0.5, booster='gbtree',
colsample_bylevel=1, colsample_bynode=1,
colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain',
interaction_constraints='',
learning_rate=0.300000012,
max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan,
monotone_constraints='()',
n_estimators=100, n_jobs=8,
num_parallel_tree=1,
objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1,
scale_pos_weight=None, subsample=1,
tree_method='exact', validate_parameters=1,
verbosity=None),
n_jobs=-1,
param_grid={'max_depth': [2, 4, 6, 8],
'n_estimators': [50, 100, 150, 200]},
scoring='neg_log_loss', verbose=1). Check the list of available parameters with `estimator.get_params().keys()`.
"""
The above exception was the direct cause of the following exception:
ValueError Traceback (most recent call last)
<ipython-input-110-efb3a24d6aec> in <module>
----> 1 grid_result_m = grid_search_m.fit(X, ym)
~\anaconda3\lib\site-packages\sklearn\utils\validation.py in inner_f(*args, **kwargs)
61 extra_args = len(args) - len(all_args)
62 if extra_args <= 0:
---> 63 return f(*args, **kwargs)
64
65 # extra_args > 0
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
839 return results
840
--> 841 self._run_search(evaluate_candidates)
842
843 # multimetric is determined here because in the case of a callable
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in _run_search(self, evaluate_candidates)
1286 def _run_search(self, evaluate_candidates):
1287 """Search all candidates in param_grid"""
-> 1288 evaluate_candidates(ParameterGrid(self.param_grid))
1289
1290
~\anaconda3\lib\site-packages\sklearn\model_selection\_search.py in evaluate_candidates(candidate_params, cv, more_results)
793 n_splits, n_candidates, n_candidates * n_splits))
794
--> 795 out = parallel(delayed(_fit_and_score)(clone(base_estimator),
796 X, y,
797 train=train, test=test,
~\anaconda3\lib\site-packages\joblib\parallel.py in __call__(self, iterable)
1052
1053 with self._backend.retrieval_context():
-> 1054 self.retrieve()
1055 # Make sure that we get a last message telling us we are done
1056 elapsed_time = time.time() - self._start_time
~\anaconda3\lib\site-packages\joblib\parallel.py in retrieve(self)
931 try:
932 if getattr(self._backend, 'supports_timeout', False):
--> 933 self._output.extend(job.get(timeout=self.timeout))
934 else:
935 self._output.extend(job.get())
~\anaconda3\lib\site-packages\joblib\_parallel_backends.py in wrap_future_result(future, timeout)
540 AsyncResults.get from multiprocessing."""
541 try:
--> 542 return future.result(timeout=timeout)
543 except CfTimeoutError as e:
544 raise TimeoutError from e
~\anaconda3\lib\concurrent\futures\_base.py in result(self, timeout)
437 raise CancelledError()
438 elif self._state == FINISHED:
--> 439 return self.__get_result()
440 else:
441 raise TimeoutError()
~\anaconda3\lib\concurrent\futures\_base.py in __get_result(self)
386 def __get_result(self):
387 if self._exception:
--> 388 raise self._exception
389 else:
390 return self._result
ValueError: Invalid parameter learning_rate for estimator GridSearchCV(cv=StratifiedKFold(n_splits=4, random_state=7, shuffle=True),
estimator=XGBClassifier(base_score=0.5, booster='gbtree',
colsample_bylevel=1, colsample_bynode=1,
colsample_bytree=1, gamma=0, gpu_id=-1,
importance_type='gain',
interaction_constraints='',
learning_rate=0.300000012,
max_delta_step=0, max_depth=6,
min_child_weight=1, missing=nan,
monotone_constraints='()',
n_estimators=100, n_jobs=8,
num_parallel_tree=1,
objective='multi:softprob', random_state=0,
reg_alpha=0, reg_lambda=1,
scale_pos_weight=None, subsample=1,
tree_method='exact', validate_parameters=1,
verbosity=None),
n_jobs=-1,
param_grid={'max_depth': [2, 4, 6, 8],
'n_estimators': [50, 100, 150, 200]},
scoring='neg_log_loss', verbose=1). Check the list of available parameters with `estimator.get_params().keys()`.
我试过改变估计器,我试过改变学习率,但我一直收到类似的错误。任何输入都将被应用!
谢谢!