Edit:

Question

The reproducible example to fix the discussion:

from sklearn.linear_model import RidgeCV
from sklearn.datasets import load_boston
from sklearn.preprocessing import scale 

boston = scale(load_boston().data)
target = load_boston().target

import numpy as np
alphas = np.linspace(1.0,200.0, 5)
fit0 = RidgeCV(alphas=alphas, store_cv_values = True, gcv_mode='eigen').fit(boston, target)
fit0.alpha_
fit0.cv_values_[:,0]

The question: what formula is used to compute fit0.cv_values_?

Edit:

@Abhinav Arora answer below seems to suggests that fit0.cv_values_[:,0][0], the first entry of fit0.cv_values_[:,0] would be

(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2

where fit1 is a ridge regression with alpha = 1.0, fitted to the data-set from which observation 0 was removed.

Let's see:

1) create new dataset with first row of original dataset removed:

from sklearn.linear_model import Ridge
boston1 = np.delete(boston, (0), axis=0)
target1 = np.delete(target, (0), axis=0)

2) fit a ridge model with alpha = 1.0 on this truncated dataset:

fit1 = Ridge(alpha=1.0).fit(boston1, target1)

3) check the MSE of that model on the first data-point:

(fit1.predict(boston[0,].reshape(1, -1)) - target[0])**2

it is array([ 37.64650853]) which is not the same as what is produced by the fit0.cv_values_[:,0], ergo:

fit0.cv_values_[:,0][0]

which is 37.495629960571137

What gives?

Answer 1

从Sklearn文档中引用：

每个alpha的交叉验证值（如果store_cv_values = True和 CV =无）。调用fit（）后，此属性将包含均方误差（默认情况下）或其值 {loss，score} _func函数（如果在构造函数中提供）。

由于您没有在构造函数中提供任何评分函数，也没有为构造函数中的cv参数提供任何内容，因此该属性应使用Leave-One out cross validation存储每个样本的均方误差。均方误差的通用公式是

Mean Squared Error

其中Y（带上限）是对回归量的预测，而另一个Y是真值。

在您的情况下，您正在进行Leave-One交叉验证。因此，在每个折叠中，您只有1个测试点，因此n = 1.因此，在您的情况下，执行fit0.cv_values_[:,0]只会为您提供训练数据集中每个点的平方误差。测试折叠，当α的值为1.0时，

希望有所帮助。

Answer 2

Let's look - it's open source after all

第一次调用fit会调用其父级_BaseRidgeCV（第997行，在该实现中）。我们还没有提供交叉验证生成器，因此我们向_RidgeGCV.fit进行另一次调用。这个函数的文档中有很多数学，但我们非常接近源代码，我会让你去阅读它。

这是实际的来源

    v, Q, QT_y = _pre_compute(X, y)
    n_y = 1 if len(y.shape) == 1 else y.shape[1]
    cv_values = np.zeros((n_samples * n_y, len(self.alphas)))
    C = []

    scorer = check_scoring(self, scoring=self.scoring, allow_none=True)
    error = scorer is None

    for i, alpha in enumerate(self.alphas):
        weighted_alpha = (sample_weight * alpha
                          if sample_weight is not None
                          else alpha)
        if error:
            out, c = _errors(weighted_alpha, y, v, Q, QT_y)
        else:
            out, c = _values(weighted_alpha, y, v, Q, QT_y)
        cv_values[:, i] = out.ravel()
        C.append(c)

请注意令人兴奋的pre_compute功能

def _pre_compute(self, X, y):
    # even if X is very sparse, K is usually very dense
    K = safe_sparse_dot(X, X.T, dense_output=True)
    v, Q = linalg.eigh(K)
    QT_y = np.dot(Q.T, y)
    return v, Q, QT_y

Abinav已经解释了数学水平上发生了什么 - 它只是累积加权均方误差。可以从代码

逐步评估其实现的详细信息以及它与实现的不同之处

How is cv_values_ computed in sklearn.linear::RidgeCV?

Edit:

2 个答案: