我已经创建了一个函数,它适合多项式回归模型,其程度随着输入程度的增加而增加。我还在列表中收集所有这些模型。
对一组给定的输入执行此功能后,我想检查模型列表以计算MSE。但是我看到各个模型引用了函数中的参数名称。
问题:如何让glm对象引用实际变量
功能定义:
poly.iter = function(dep,indep,dat,deg){ #Function to iterate through polynomial fits upto input degree
set.seed(1)
par(mfrow=c(ceiling(sqrt(deg)),ceiling(sqrt(deg)))) #partitioning the plotting window
MSE.CV = rep(0,deg)
modlist = list()
xvar = seq(from=min(indep),to=max(indep),length.out = nrow(dat))
for (i in 1:deg){
mod = glm(dep~poly(indep,i),data=dat)
#MSE.CV[i] = cv.glm(dat,mod,K=10)$delta[2] #Inside of this function, cv.glm is generating warnings. Googling has not helped as it can typically happen with missing obs but we don't have any in Auto data
modlist = c(modlist,list(mod))
MSE.CV[i] = mean(mod$residuals^2) #GLM part is giving 5x the error i.e. delta is 5x of MSE. Not sure why
plot(jitter(indep),jitter(dep),cex=0.5,col="darkgrey")
preds = predict(mod,newdata=list(indep=xvar),se=T)
lines(xvar,preds$fit,col="blue",lwd=2)
matlines(xvar,cbind(preds$fit+2*preds$se.fit,preds$fit-2*preds$se.fit),lty=3,col="blue")
}
return(list("models"=modlist,"errors"=MSE.CV))
}
功能调用:
output.mpg.disp = poly.iter(mpg,displacement,Auto,9)
检查三度模型:
> output.mpg.disp[[1]][[3]]
Call: glm(formula = dep ~ poly(indep, i), data = dat)
Coefficients:
(Intercept) poly(indep, i)1 poly(indep, i)2 poly(indep, i)3
23.446 -124.258 31.090 -4.466
Degrees of Freedom: 391 Total (i.e. Null); 388 Residual
Null Deviance: 23820
Residual Deviance: 7392 AIC: 2274
现在我无法在cv.glm中使用此对象与' Auto'数据集,因为它不会识别indep,dep和i
答案 0 :(得分:1)
您可以使用as.formula()
函数在调用glm()
之前使用您的公式转换字符串。这将解决您的问题(如何使glm对象引用实际变量),但我不确定它是否足够用于稍后调用cv.glm
(我无法在此重现您的代码,没有错误)。要清楚,您可以替换
mod = glm(dep~poly(indep,i),data = dat)
有类似的东西:
myexp = paste0(dep,“~poly(”,indep,“,”,i,“)”)
mod = glm(as.formula(myexp),data = dat)
然后需要将变量dep
和indep
设置为包含您要引用的变量名称的字符(例如indep="displ"
)。