在学习python和神经网络的同时
我有一个明显适合的神经网络模型。
因此,我决定将火车数据拆分为一个测试和验证集中。
因此 train 和结果 y 变为: traintest 和 ytest ,以及 trainval 和< strong> yval
并创建了一个测试功能来尝试不同的模型设置:
learning_rates = [0.005, 0.001]
estimaters =[50,300,800]
criterion =['friedman_mse','mse','mae']
from sklearn.model_selection import train_test_split
traintest, trainval, ytest, yval = train_test_split(train, y,random_state=4,test_size =0.35 )
for crit in criterion :
for learning_rate in learning_rates:
for est in estimaters:
gb = GradientBoostingClassifier(criterion= crit, n_estimators=est, learning_rate = learning_rate, min_samples_leaf=2,min_samples_split=20, max_features=3, max_depth = 2, random_state = 0)
gb.fit(traintest, ytest)
s = gb.score(traintest, ytest)
if (s>0.7)&(s<1.0): # <1 as I dont want overfitted results.
# neither do i want to show poor results
s2 = gb.score(trainval, yval) # only calc if s was good enough
print("Learn rate: {0:.3f} Est: {1:.0f} S: {2:.16f} s2: {2:.16f} ".format(learning_rate,est,s,s2)+crit)
因此基本上从上面的代码中,我得到了 s 和 s2 ,并打印了测试设置
我的分数以 s 表示火车数据,而 s2 是以验证数据
然后,我认为上述代码中使用不同的数据集会导致 s 和 s2 的结果不同
令我惊讶的是,它们怎么可能?
关于代码的输出:
LearnRate: 0.005 Est: 300 S: 0.97063903 s2: 0.97063903 friedman_mse
LearnRate: 0.001 Est: 800 S: 0.94300518 s2: 0.94300518 friedman_mse
LearnRate: 0.005 Est: 300 S: 0.97063903 s2: 0.97063903 mse
LearnRate: 0.001 Est: 800 S: 0.94300518 s2: 0.94300518 mse
LearnRate: 0.005 Est: 300 S: 0.96718480 s2: 0.96718480 mae
这是我在python中编码方式错误的结果,还是GradientBoostingClassifier发生了其他事情?