我正在尝试在两个机器学习模型中绘制学习曲线。我的其中一种模型的MSE值很好。但是,在这一步中,当我绘制学习曲线时,训练误差值始终为零。我不知道我的代码或数据中是否存在错误。
我的代码源自以下情节:
我的下面的代码:
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import learning_curve
# 80:20 ratio.
#80% train size
#20% test
train_sizes = [1, 50, 100, 150, 204]
features = ['Sex', 'Age', 'Education', 'Ideology', 'Likeability_pre-debate_AC', 'Proximity_PS', 'Proximity_Other', 'Debate Performance_AC', 'Int_Likeability_pre-debate_debate_performance_AC', 'Int_Likeability_pre-debate_Proximity_PS', 'Int_Likeability_pre-debate_Proximity_other', 'Likeability_post_debate_AC']
target = 'Likeability_post_debate_AC'
train_sizes, train_scores, validation_scores = learning_curve(estimator = LinearRegression(),X = df2[features],y = df2[target], train_sizes = train_sizes, cv = 5,scoring = 'neg_mean_squared_error')
train_scores_mean = -train_scores.mean(axis = 1)
validation_scores_mean = -validation_scores.mean(axis = 1)
plt.style.use('seaborn')
plt.plot(train_sizes, train_scores_mean, label = 'Training error')
plt.plot(train_sizes, validation_scores_mean, label = 'Validation error')
plt.ylabel('MSE', fontsize = 14)
plt.xlabel('Training set size', fontsize = 14)
plt.title('Learning curves for a linear regression model', fontsize = 18, y = 1.03)
plt.legend()
plt.ylim(-0.5,0.5)
为什么我的训练错误从一开始就保持为零?