很好的测试成绩,但使用交叉验证的稀疏分数

时间:2017-10-16 10:29:11

标签: machine-learning scikit-learn random-forest decision-tree

我正在使用泰坦尼克号数据集实现随机森林回归器。

以下是它的样子:

from sklearn.ensemble import RandomForestRegressor

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=123)
reg_rf = RandomForestRegressor(random_state=1)  # by default, 10 trees are used
reg_rf.fit(X_train, y_train)
rfc_train_score = reg_rf.score(X_train, y_train)
rfc_test_score = reg_rf.score(X_test, y_test)
print ('train accuracy =', rfc_train_score)
print ('test accuracy =', rfc_test_score)

我获得以下输出:

train accuracy = 0.988660049497
test accuracy = 0.942596699112

但是当我尝试在这个模型上进行交叉验证时:

from sklearn.model_selection import cross_val_score 

scores = cross_val_score(reg_rf, X, y, scoring='r2', cv=5)
print(scores)
它给了我:

[ 0.57775117  0.88199732  0.69066105  0.90320741  0.87953982]

正如您所看到的,分数彼此非常不同。 我该如何解释这种行为?

我正在运行Python 3.6.x

0 个答案:

没有答案