使用 Pipeline 拟合多项式学习曲线

时间:2021-02-23 13:44:21

标签: python scikit-learn

我正在尝试为线性回归模型绘制一系列学习曲线。一个是线性模型,另一个是 2 阶多项式,然后是 10 阶多项式。我不太确定这个管道函数在做什么。 2 度的图形看起来相当正确,但 15 度的图形看起来非常错误。我知道它会严重过度拟合,但它似乎仍然非常错误。此代码是否为我提供了可靠的 RMSE 图?

15 次多项式图:

def compute_rms(mu_1, mu_2):
    rms = mean_squared_error(mu_1, mu_2, squared=False) # args=(y_true, y_pred)
    return rms

from sklearn.model_selection import train_test_split
def plot_learning_curve(model,X , y ):
    X_train, X_val , y_train, y_val = train_test_split(X, y, test_size=0.7, random_state=10)
    train_errors, val_errors = [], []
    for m in range(1, len(X_train)):
        model.fit(X_train[:m], y_train[:m])
        y_train_predict = model.predict(X_train[:m])
        y_val_predict = model.predict(X_val)
        train_errors.append(compute_rms(y_train_predict, y_train[:m]))
        val_errors.append(compute_rms(y_val_predict, y_val))
    plt.figure(figsize=(8,4))
    plt.plot(train_errors, "r-+", linewidth=2, label="Training set")
    plt.plot(val_errors, "b-", linewidth=3, label="Validation set")
    plt.legend(loc="upper right", fontsize=14)   
    plt.xlabel("Training set size", fontsize=14) 
    plt.ylabel("RMSE", fontsize=14)
    return train_errors
lin_reg = LinearRegression()
train_errors = plot_learning_curve(lin_reg,z_sample_reshape,mu_sample_reshape)
rms_sample_lin = train_errors[-1]

from sklearn.pipeline import Pipeline
polynomial_regression = Pipeline((
        ("poly_features", PolynomialFeatures(degree=15, include_bias=False)),
        ("lin_reg", LinearRegression()),
    ))
train_errors = plot_learning_curve(polynomial_regression, z_sample_reshape,mu_sample_reshape)
rms_sample_poly = train_errors[-1]

0 个答案:

没有答案