Question

我正在为自己制作模型验证测试功能。

这样做，我让

 a=entire set of predictor variables in model-building set
 b=set of response variable in model-building set
 c=entire set of predictor variables in validation set
 d=set of response variable in validation set
 e=number of column which I have an interest

这是基于应用线性回归模型，Kutner，所以我用过 library(ALSM)。

就我而言，模型构建集为SurgicalUnit，验证集为SurgicalUnitAdditional。

两个数据都包含10列，其中第1列到第8列是整个indep集。变量，第9个是响应变量，第10个是日志（响应变量）

所以， a=SurgicalUnit[,1:8]; b=SurgicalUnit[,10]; c=SurgicalUnitAdditional[,1:8]; d=SurgicalUnitAdditional[,10]; e=c(1,2,3,8)

，因为我想要使用记录的响应变量，我想用变量x1，x2，x3和x8进行回归。

（请注意我之所以使用具有特定列数的“整个”自变量集而不是直接放置感兴趣的自变量集的原因是因为我需要立即在我的函数中获得Mallow的Cp。）< / p>

所以我的回归是asdf=lm(b~as.matrix(a[e]))，问题是，我想预测用模型构建集构建的模型中的验证集。所以，我让preds=data.frame(c[e])和predict(asdf, newdata=preds)与predict(asdf)相等，这意味着它的拟合值为asdf。

为什么预测不起作用？帮助将不胜感激。

以下是我的功能

mod.valid=function(a,b,c,d,e){
asdf=lm(b~as.matrix(a[e]))              # model what you want
qwer=lm(b~as.matrix(a[1:max(e)]))       # full model in order to get Cp
mat=round(coef(summary(asdf))[,c(-3,-4)],4); mat2=matrix(0,5,2)
mat=rbind(mat,mat2); mat                  # matrix for coefficients and others(model-building)
n=nrow(anova(asdf)); m=nrow(anova(qwer))
nn=length(b)                                  # To get size of sample size
p=asdf$rank                                   # To get parameters p
cp=anova(asdf)$Sum[n] / (anova(qwer)$Mean[m]) - (nn-2*p); cp=round(cp,4)
mat[p+1,1]=p; mat[p+1,2]=cp             # adding p and Cp
rp=summary(asdf)$r.squared; rap=summary(asdf)$adj.r.squared; rp=round(rp,4); rap=round(rap,4)
mat[p+2,1]=rp; mat[p+2,2]=rap           # adding  Rp2 and Rap2
sse=anova(asdf)$Sum[n]; pre=MPV::PRESS(asdf); sse=round(sse,4); pre=round(pre,4)
mat[p+3,1]=sse; mat[p+3,2]=pre        # adding SSE and PRESS
**preds=data.frame(c[e]); predd=predict(asdf,newdata=preds)** **# I got problem here!**
mspr=sum((d-predd)^2) / length(d); mse=anova(asdf)$Mean[n]; mspr=round(mspr,4); mse=round(mse,4)
mat[p+4,1]=mse; mat[p+4,2]=mspr       # adding MSE and MSPR
aic=nn*log(anova(asdf)$Sum[n]) - nn*log(nn) + 2*p; aic=round(aic,4)
bic=nn*log(anova(asdf)$Sum[n]) - nn*log(nn) + log(nn)*p; bic=round(bic,4)
mat[p+5,1]=aic; mat[p+5,2]=bic        # adding AIC and BIC
rownames(mat)[p+1]="p&Cp"; rownames(mat)[p+2]="Rp.sq&Rap.sq"
rownames(mat)[p+3]="SSE&PRESS"; rownames(mat)[p+4]="MSE&MSPR"; rownames(mat)[p+5]="AIC&BIC"

asdf2=lm(d~as.matrix(c[e]))
qwer2=lm(d~as.matrix(c[1:max(e)]))
matt=round(coef(summary(asdf2))[,c(-3,-4)],4); matt2=matrix(0,5,2)
matt=rbind(matt,matt2); matt              # matrix for coefficients and others(validation)
n2=nrow(anova(asdf2)); m2=nrow(anova(qwer2))
nn2=length(d)                                    # To get size of sample size
p2=asdf$rank                                     # To get parameters p
cp2=anova(asdf2)$Sum[n2] / (anova(qwer2)$Mean[m2]) - (nn2-2*p2); cp2=round(cp2,4)
matt[p2+1,1]=p2; matt[p2+1,2]=cp2           # adding p and Cp
rp2=summary(asdf2)$r.squared; rap2=summary(asdf2)$adj.r.squared; rp2=round(rp2,4); rap2=round(rap2,4)
matt[p2+2,1]=rp2; matt[p2+2,2]=rap2     # adding  Rp2 and Rap2
sse2=anova(asdf2)$Sum[n]; pre2=MPV::PRESS(asdf2); sse2=round(sse2,4); pre2=round(pre2,4)
matt[p2+3,1]=sse2; matt[p2+3,2]=pre2      # adding SSE and PRESS
mse2=anova(asdf2)$Mean[n]; mse2=round(mse2,4)
matt[p2+4,1]=mse2; matt[p2+4,2]=NA        # adding MSE and MSPR, in this case MSPR=0
aic2=nn2*log(anova(asdf2)$Sum[n2]) - nn2*log(nn2) + 2*p2; aic2=round(aic2,4)
bic2=nn2*log(anova(asdf2)$Sum[n2]) - nn2*log(nn2) + log(nn2)*p2; bic2=round(bic2,4)
matt[p2+5,1]=aic2; matt[p2+5,2]=bic2      # adding AIC and BIC
mat=cbind(mat,matt); colnames(mat)=c("Estimate","Std.Error","Val.Estimate","Val.Std.Error")
print(mat)

}

此功能将为模型验证提供有用的统计数据。

它返回一个矩阵，其系数为p，Mallow的Cp，R.squared，R.adj.squared，SSE，PRESS，MSE，MSPR，AIC和BIC。

Everythig适用于一般给定数据，但 MSPR 除外，因为predict函数不起作用！它只返回适合的。

Answer 1

你能尝试这样的事吗？您必须确保训练和测试数据具有相同的列名称。

.sqs-block-map {pointer-events: none;}

预测函数返回Fitted值即使我把newdata

1 个答案: