Question

当我尝试使用gridsearchCV进行具有10倍交叉验证的线性（岭回归）模式时遇到了此问题。数据集包含大约 15000 y ，对于每个y，对应的X的大小为 320000 。我使用train_test_split排除了整个数据集的20％作为我的test_set。对于训练集，执行了带有网格搜索的10倍CV。进行网格搜索后，best_score为 R2 = -2000 ，这意味着该模型没有任何意义。

Actions

但是，当我将模型（在这种情况下为网格）应用于我的测试集时，R2分数出人意料地约为0.9。

alphas = [1e-5, 1e-4, 1e-3, 1e-2, 1e-1, 1, 10, 20, 50, 100, 500]
model = Ridge(normalize=True,copy_X=True)
grid = GridSearchCV(estimator=model, 
param_grid=dict(alpha=alphas),scoring='r2',cv=10)
grid.fit(X_train, y_train)

# summarize the results of the grid search
print(grid.best_score_)
print(grid.best_estimator_.alpha)

我试图用不同的random_seeds拆分数据集很多次，所以这不是巧合。有人可以帮我吗？

如何理解极低的交叉验证分数（例如R2减去）和良好的测试分数？

0 个答案: