The reproducible example to fix the discussion:
from sklearn.linear_model import RidgeCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale
boston = scale(load_boston().data)
target = load_boston().target
import numpy as np
alphas = np.linspace(1.0,200.0, 5)
fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target)
fit0.alpha_
fit0.cv_values_[:,0]
The question: what formula is used to compute fit0.cv_values_
?
@Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0]
, the first entry of fit0.cv_values_[:,0]
would be
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
where fit1
is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0
was removed.
Let's see:
1) create new dataset with first row of original dataset removed:
from sklearn.linear_model import Ridge
boston1 = np.delete(boston, (0), axis=0)
target1 = np.delete(target, (0), axis=0)
2) fit a ridge model with alpha = 1.0 on this truncated dataset:
fit1 = Ridge(alpha=1.0).fit(boston1, target1)
3) check the MSE of that model on the first data-point:
(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2
it is array([ 37.64650853])
which is not the same as what is produced by the fit0.cv_values_[:,0]
, ergo:
fit0.cv_values_[:,0][0]
which is 37.495629960571137
What gives?
答案 0 :(得分:3)
从Sklearn文档中引用:
每个alpha的交叉验证值(如果store_cv_values = True和 CV =无)。调用fit()后,此属性将包含 均方误差(默认情况下)或其值 {loss,score} _func函数(如果在构造函数中提供)。
由于您没有在构造函数中提供任何评分函数,也没有为构造函数中的cv
参数提供任何内容,因此该属性应使用Leave-One out cross validation存储每个样本的均方误差。均方误差的通用公式是
其中Y(带上限)是对回归量的预测,而另一个Y是真值。
在您的情况下,您正在进行Leave-One交叉验证。因此,在每个折叠中,您只有1个测试点,因此n = 1.因此,在您的情况下,执行fit0.cv_values_[:,0]
只会为您提供训练数据集中每个点的平方误差。测试折叠,当α的值为1.0时,
希望有所帮助。
答案 1 :(得分:2)
Let's look - it's open source after all
第一次调用fit会调用其父级_BaseRidgeCV(第997行,在该实现中)。我们还没有提供交叉验证生成器,因此我们向_RidgeGCV.fit进行另一次调用。这个函数的文档中有很多数学,但我们非常接近源代码,我会让你去阅读它。
这是实际的来源
v, Q, QT_y = _pre_compute(X, y)
n_y = 1 if len(y.shape) == 1 else y.shape[1]
cv_values = np.zeros((n_samples * n_y, len(self.alphas)))
C = []
scorer = check_scoring(self, scoring=self.scoring, allow_none=True)
error = scorer is None
for i, alpha in enumerate(self.alphas):
weighted_alpha = (sample_weight * alpha
if sample_weight is not None
else alpha)
if error:
out, c = _errors(weighted_alpha, y, v, Q, QT_y)
else:
out, c = _values(weighted_alpha, y, v, Q, QT_y)
cv_values[:, i] = out.ravel()
C.append(c)
请注意令人兴奋的pre_compute
功能
def _pre_compute(self, X, y):
# even if X is very sparse, K is usually very dense
K = safe_sparse_dot(X, X.T, dense_output=True)
v, Q = linalg.eigh(K)
QT_y = np.dot(Q.T, y)
return v, Q, QT_y
Abinav已经解释了数学水平上发生了什么 - 它只是累积加权均方误差。可以从代码
逐步评估其实现的详细信息以及它与实现的不同之处