我正在使用glmnet
软件包的LASSO进行双重交叉验证,但是,当我绘制结果时,得到的lambda为0-150000,这对我来说是不现实的,不确定我在做什么错,可以有人指出我正确的方向。预先感谢!
calcium = read.csv("calciumgood.csv", header=TRUE)
dim(calcium)
n = dim(calcium)[1]
calcium = na.omit(calcium)
names(calcium)
library(glmnet) # use LASSO model from package glmnet
lambdalist = exp((-1200:1200)/100) # defines models to consider
fulldata.in = calcium
x.in = model.matrix(CAMMOL~. - CAMLEVEL - AGE,data=fulldata.in)
y.in = fulldata.in[,2]
k.in = 10
n.in = dim(fulldata.in)[1]
groups.in = c(rep(1:k.in,floor(n.in/k.in)),1:(n.in%%k.in))
set.seed(8)
cvgroups.in = sample(groups.in,n.in) #orders randomly, with seed (8)
#LASSO cross-validation
cvLASSOglm.in = cv.glmnet(x.in, y.in, lambda=lambdalist, alpha = 1, nfolds=k.in, foldid=cvgroups.in)
plot(cvLASSOglm.in$lambda,cvLASSOglm.in$cvm,type="l",lwd=2,col="red",xlab="lambda",ylab="CV(10)")
whichlowestcvLASSO.in = order(cvLASSOglm.in$cvm)[1]; min(cvLASSOglm.in$cvm)
bestlambdaLASSO = (cvLASSOglm.in$lambda)[whichlowestcvLASSO.in]; bestlambdaLASSO
abline(v=bestlambdaLASSO)
bestlambdaLASSO # this is the lambda for the best LASSO model
LASSOfit.in = glmnet(x.in, y.in, alpha = 1,lambda=lambdalist) # fit the model across possible lambda
LASSObestcoef = coef(LASSOfit.in, s = bestlambdaLASSO); LASSObestcoef # coefficients for the best model fit
答案 0 :(得分:0)
我找到了您引用的数据集 Calcium, inorganic phosphorus and alkaline phosphatase levels in elderly patients。
基本上数据是“脏”的,这可能是算法无法正确收敛的原因。例如。有771
岁的患者,男女都有1
和2
的性别,有22
的性别编码等。
对于您的情况,您仅删除了NA
个。
您还需要检查data.frame
导入的类型。例如。而不是因素,可以将其导入整数(性别,实验室和年龄组),这会影响模型。
我认为您需要: 1)清理数据; 2)如果不起作用,请提交* .csv文件