使用sklearn的岭脊回归制作L曲线

时间:2019-10-17 12:09:30

标签: python machine-learning scikit-learn statistics linear-regression

一种可视化ridge regression解决方案的常用方法是L curve,它针对正则化参数的不同选择绘制平方误差的总和相对于脊峰惩罚。 sklearn可以做到吗?

2 个答案:

答案 0 :(得分:1)

这是一个纯粹的sklearn答案:

import numpy as np
from sklearn.linear_model import Ridge

alphas = np.logspace(-10, 10, 1000)
solution_norm = []
residual_norm = []

for alpha in alphas: 
    lm = Ridge(alpha=alpha)
    lm.fit(X, y)
    solution_norm += [(lm.coef_**2).sum()]
    residual_norm += [((lm.predict(X) - y)**2).sum()]

plt.loglog(residual_norm, solution_norm, 'k-')
plt.show()

其中Xy分别是您的预测变量和目标。

答案 1 :(得分:0)

scikit-learn中没有此类内置功能,但此类功能由Yellowbrick库提供(使用pipconda安装);将LassoCV示例从其documentation改编为您的RidgeCV案例可得出:

import numpy as np
from sklearn.linear_model import RidgeCV
from yellowbrick.regressor import AlphaSelection
from yellowbrick.datasets import load_concrete

# Load the regression dataset
X, y = load_concrete()

# Create a list of alphas to cross-validate against
alphas = np.logspace(-10, 1, 40)

# Instantiate the linear model and visualizer
model = RidgeCV(alphas=alphas)
visualizer = AlphaSelection(model)
visualizer.fit(X, y)
visualizer.show()

enter image description here