Lambda问题或交叉验证

时间:2018-07-13 00:14:03

标签: r glmnet

我正在使用glmnet软件包的LASSO进行双重交叉验证,但是,当我绘制结果时,得到的lambda为0-150000,这对我来说是不现实的,不确定我在做什么错,可以有人指出我正确的方向。预先感谢!

calcium = read.csv("calciumgood.csv", header=TRUE)
dim(calcium)
n = dim(calcium)[1]
calcium = na.omit(calcium)
names(calcium)

library(glmnet)  # use LASSO model from package glmnet 
lambdalist = exp((-1200:1200)/100)  # defines models to consider


fulldata.in = calcium
x.in = model.matrix(CAMMOL~. - CAMLEVEL - AGE,data=fulldata.in)
y.in = fulldata.in[,2]
k.in = 10 
n.in = dim(fulldata.in)[1]
groups.in = c(rep(1:k.in,floor(n.in/k.in)),1:(n.in%%k.in))  
set.seed(8)
cvgroups.in = sample(groups.in,n.in)  #orders randomly, with seed (8) 
#LASSO cross-validation
cvLASSOglm.in = cv.glmnet(x.in, y.in, lambda=lambdalist, alpha = 1, nfolds=k.in, foldid=cvgroups.in)
plot(cvLASSOglm.in$lambda,cvLASSOglm.in$cvm,type="l",lwd=2,col="red",xlab="lambda",ylab="CV(10)")
whichlowestcvLASSO.in = order(cvLASSOglm.in$cvm)[1];     min(cvLASSOglm.in$cvm)
bestlambdaLASSO = (cvLASSOglm.in$lambda)[whichlowestcvLASSO.in];     bestlambdaLASSO
abline(v=bestlambdaLASSO)
bestlambdaLASSO  # this is the lambda for the best LASSO model
LASSOfit.in = glmnet(x.in, y.in, alpha = 1,lambda=lambdalist)  # fit the model across possible lambda
LASSObestcoef = coef(LASSOfit.in, s = bestlambdaLASSO); LASSObestcoef # coefficients for the best model fit

1 个答案:

答案 0 :(得分:0)

我找到了您引用的数据集 Calcium, inorganic phosphorus and alkaline phosphatase levels in elderly patients

基本上数据是“脏”的,这可能是算法无法正确收敛的原因。例如。有771岁的患者,男女都有12的性别,有22的性别编码等。

对于您的情况,您仅删除了NA个。

您还需要检查data.frame导入的类型。例如。而不是因素,可以将其导入整数(性别,实验室和年龄组),这会影响模型。

我认为您需要: 1)清理数据; 2)如果不起作用,请提交* .csv文件