glmnet(套索)

时间:2018-06-04 21:25:42

标签: r machine-learning comparison

我不太确定我是否应该在这里或在Cross-Valitated上发布我的答案,但由于这也是一个编程问题(也许我在我的代码中做了一些虚假的事情),我也会在这里问。我试图了解套索是如何标准化的。在阅读了几个线程并发布后,它似乎标准化了y和X,其中sd由N而不是N-1划分。此外,Lambda也通过y的标准偏差标准化。此示例应生成相同的结果,但事实并非如此。我的代码有问题还是忘记了什么?

#Standardize variables as in glmnet: (need to use n instead of (n-1) as denominator)
mysd <- function(y) sqrt(sum((y-mean(y))^2)/length(y))
set.seed(8675309)
some data for an example
n = 500
p = 10
X = matrix(rnorm(n*p), ncol=p)
X_sc = scale(X, scale=apply(X, 2, mysd))
b = c(.5, -.5, .25, -.25, .125, -.125, rep(0, 4))
y = X %*% b + rnorm(n, sd=.5)
y_sc = scale(y, scale = mysd(y))

#Lasso with "own" standardization
fit1 = coef(glmnet(X_sc, y_sc, intercept = F, lambda = 0.001/mysd(y1), thresh = 1e-12, 
            standardize = F, standardize.response = F), s=0.001/mysd(y1))
#Standardization used in glmnet
fit2 = coef(glmnet(X, y, intercept = F, lambda = 0.001, thresh = 1e-12), s=0.001)

fit1[2]*mysd(y)/mysd(X[,1])
fit2[2]

0 个答案:

没有答案