我已经将gridsearchCV
,DecisionTreeClassifier
,RandomForestClassifier
,LogisticRegression
的估计量应用于XGBClassifier
,并将它们全部用于整体学习。
在我的系统和朋友的系统中,使用相同的测试和训练数据,gridSearchCV
给出的所有这些估计量的结果是不同的,我不知道为什么?
我们正在使用相同的数据进行训练和测试,但是gridsearch
在两个系统中的这些数据都给出了不同的结果,只是想知道应该进行哪些更改才能使该系统在任何系统上给出相同的结果?
gs_dt = GridSearchCV(estimator=DecisionTreeClassifier(random_state=42,class_weight={1:10, 0:1}),
param_grid=[{'max_depth': [ 2, 4, 6, 8, 10],
'criterion':['gini','entropy'],
"max_features":["auto", None],
"max_leaf_nodes":[10,20,30,40]}],
scoring=scoring,
cv=10,
refit='recall')
gs_rf = GridSearchCV(estimator=RandomForestClassifier(n_jobs=-1, oob_score = True,class_weight={1: 10/11, 0: 1/11}),
param_grid=[{'max_depth': [4, 6, 8, 10, 12, 16, 20, None],
'max_features': ['auto', 'sqrt'],
'min_samples_leaf': [2, 4, 8],
'min_samples_split': [10, 20]}],
scoring=scoring,
cv=10,
n_jobs=4,
refit='recall')
gs_lr = GridSearchCV(estimator=LogisticRegression(multi_class='ovr',random_state=42,class_weight={1:10, 0:1}),
param_grid=[{'C': [0.000001, 0.00001, 0.0001, 0.001, 0.01, 0.1 ,1],
'penalty':['l1','l2']}],
scoring=scoring,
cv=10,
refit='recall')
gs_gb = GridSearchCV(estimator=XGBClassifier(n_jobs=-1),
param_grid=[{'learning_rate': [0.01, 0.05, 0.1, 0.2],
'max_depth': [4, 6, 8, 10, 12, 16, 20],
'min_samples_leaf': [4, 8, 12, 16, 20],
'max_features': ['auto', 'sqrt']}],
scoring=scoring,
cv=10,
n_jobs=4,
refit='recall')
例如,第一个gridsearchcv在我的系统上给出此结果
DecisionTreeClassifier(class_weight={1: 10, 0: 1}, criterion='gini',
max_depth=8, max_features=None, max_leaf_nodes=10,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False, random_state=42,
splitter='best')
在我朋友的系统上,它给出
DecisionTreeClassifier(class_weight={0: 1, 1: 10}, criterion='gini',
max_depth=10, max_features=None, max_leaf_nodes=10,
min_impurity_decrease=0.0, min_impurity_split=None,
min_samples_leaf=1, min_samples_split=2,
min_weight_fraction_leaf=0.0, presort=False,
random_state=42, splitter='best')
类似地,我在我和朋友的系统上得到了不同的结果