我正在尝试对多个回归模型进行交叉验证,并且我想提出一个字典,该字典与每个模型的cv得分相关联。 我认为代码是正确的,但是需要太多时间(1小时30分钟)
1)您知道为什么要花这么多钱吗? 2)我该怎么做才能加快速度?
from sklearn.model_selection import cross_validate
from sklearn.linear_model import LinearRegression
from sklearn.ensemble import RandomForestRegressor
from sklearn.linear_model import Ridge
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.linear_model import LogisticRegression
from sklearn.svm import SVR
from sklearn.neural_network import MLPRegressor
estimator_names = ['Linear Regression', 'Random Forest Regressor', 'Ridge Regressor', 'Gradient Boosting Regressor',
'SVM Regressor']
estimators = [LinearRegression, RandomForestRegressor, Ridge, GradientBoostingRegressor, SVR]
scores = dict()
counter = 0
for estimator in estimators:
regressor = estimator()
score = cross_validate(regressor, X_train, y_train, scoring = 'neg_mean_squared_error', cv = 4)['test_score']
scores[estimator_names[counter]] = score
counter += 1
print(scores)
我的预期输出是:
scores = {'Linear Regressor':[0.892, 0.895, 0.824, 0.798], 'Random Forest
Regressor': [0.872, 0.495, 0.624, 0.758], 'Ridge Regressor' : [0.892, 0.895, 0.824, 0.798]}
(值是随机的)