为了找到基于AIC的最佳逻辑模型,我在UCI存储库(here)上运行德国信用数据循环,如下所示: 1)我将数据保存在名为" credit"的数据框中。标题从A1到A16(A16作为响应,只有A2和A3作为自变量)。 2)运行以下代码:
credit <-
read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/credit- screening/crx.data", na.strings="?",col.names=paste0('A',1:16))
#Remove rows with <NA>
credit <- credit[!is.na(credit$A2) & !is.na(credit$A2)&!is.na(credit$A16),]
print(head(credit))
print(tail(credit))
k<-5
pAIC<-c()
pd<-c()
library(splines)
for (i in 1:k){
for (j in 1:k){
if(i==1 & j==1){
optPmodel<-pModel<-
glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
bestPAIC<-extractAIC(pModel)[2]
pd<-c(pd,extractAIC(pModel)[1])
pAIC<-c(pAIC,bestPAIC)
}else{
pModel<-
glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
if((tmp<-extractAIC(pModel)[2]) < bestPAIC){
bestPAIC<-tmp
optPmodel<-pModel
}
pd<-c(pd,extractAIC(pModel)[1])
pAIC<-c(pAIC,tmp)
}
}
}
newA2<-seq(mA2<-floor(min(credit$A2)),MA2<-ceiling(max(credit$A2)),by=1)
newA3<-seq(mA3<-floor(min(credit$A3)),MA3<-ceiling(max(credit$A3)),by=1/2)
ii<-c()
jj<-c()
for (i in newA2){
for (j in newA3){
ii<-c(ii,i)
jj<-c(jj,j)
}
}
newPts<-data.frame(A2=ii, A3=jj) #add rows
# build the predictor for all the new points
####### This is where the code crashes:
nlogitPredP<-predict(optPmodel, newPts, type="response")
第一个双重for循环在i和j上以1:k运行,并且对于每个构建逻辑模型A16~poly(A2,i)* poly(A3,j)并且如果它具有更好的则将其设置为optPmodel AIC比当前的optPmodel。当我想在预测中使用它时,我得到以下错误:
"Error: variables ‘poly(A2, i)’, ‘poly(A3, j)’ were specified with different types from the fit
In addition: Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred
2: glm.fit: fitted probabilities numerically 0 or 1 occurred
3: glm.fit: fitted probabilities numerically 0 or 1 occurred
4: glm.fit: fitted probabilities numerically 0 or 1 occurred
5: glm.fit: fitted probabilities numerically 0 or 1 occurred
6: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
longer object length is not a multiple of shorter object length
7: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
longer object length is not a multiple of shorter object length
我应该提一下,当我用B样条bs(A2,df = i)* bs(A3,df = j)替换poly(A2,i)* poly(A3,j)时,代码工作正常。最后,当我检查optPmodel并意识到i = 5,j = 5时,我在交互式会话中做了以下事情:
optPmodel1<-glm(A16~poly(A2,5)*poly(A3,5),family=binomial,data=credit)
nlogitPredP<-predict(optPmodel1, newPts, type="response")
然后它工作得很好。任何见解都将不胜感激。