我使用标准糖尿病数据集进行回归任务。
import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.metrics import mean_squared_error
from sklearn.model_selection import train_test_split
from sklearn.utils import shuffle
diab =load_diabetes()
df = pd.DataFrame(diab.data,columns=diab.feature_names)
df['target'] = diab.target
X = df.iloc[:,:-1]
y = df.iloc[:,-1:]
X_train, X_test,y_train, y_test= train_test_split(X,y,test_size = 0.3,random_state=42)
grad_boost = GradientBoostingRegressor(learning_rate=0.001,loss='ls',max_depth=19,
max_features=5)
grad_boost = grad_boost.fit(X_train,y_train)
mse = mean_squared_error(y_test, grad_boost.predict(X_test))
print("MSE: %.4f" % mse) #gives error of 3400-5000 depending on params
我检查了X,y,火车和测试尺寸的形状。 MSE巨大错误的原因可能是什么?