Question

为了找到基于AIC的最佳逻辑模型，我在UCI存储库（here）上运行德国信用数据循环，如下所示： 1）我将数据保存在名为＆＃34; credit＆＃34;的数据框中。标题从A1到A16（A16作为响应，只有A2和A3作为自变量）。 2）运行以下代码：

credit <-
read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/credit-    screening/crx.data", na.strings="?",col.names=paste0('A',1:16))

#Remove rows with <NA> 
credit <- credit[!is.na(credit$A2) & !is.na(credit$A2)&!is.na(credit$A16),]
print(head(credit))
print(tail(credit))

k<-5
pAIC<-c()
pd<-c()
library(splines)
for (i in 1:k){
  for (j in 1:k){
      if(i==1 & j==1){
        optPmodel<-pModel<-
           glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
        bestPAIC<-extractAIC(pModel)[2]
        pd<-c(pd,extractAIC(pModel)[1])
        pAIC<-c(pAIC,bestPAIC)
      }else{
      pModel<-
           glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
        if((tmp<-extractAIC(pModel)[2]) < bestPAIC){
         bestPAIC<-tmp
         optPmodel<-pModel
        }
       pd<-c(pd,extractAIC(pModel)[1])
       pAIC<-c(pAIC,tmp)
      }
     }
}

newA2<-seq(mA2<-floor(min(credit$A2)),MA2<-ceiling(max(credit$A2)),by=1)
newA3<-seq(mA3<-floor(min(credit$A3)),MA3<-ceiling(max(credit$A3)),by=1/2)

ii<-c()
jj<-c()
for (i in newA2){
   for (j in newA3){
       ii<-c(ii,i)
       jj<-c(jj,j)
   }
}

newPts<-data.frame(A2=ii, A3=jj) #add rows
# build the predictor for all the new points
####### This is where the code crashes: 
nlogitPredP<-predict(optPmodel, newPts, type="response")

第一个双重for循环在i和j上以1：k运行，并且对于每个构建逻辑模型A16~poly（A2，i）* poly（A3，j）并且如果它具有更好的则将其设置为optPmodel AIC比当前的optPmodel。当我想在预测中使用它时，我得到以下错误：

"Error: variables ‘poly(A2, i)’, ‘poly(A3, j)’ were specified with different              types from the fit
In addition: Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 
3: glm.fit: fitted probabilities numerically 0 or 1 occurred 
4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
5: glm.fit: fitted probabilities numerically 0 or 1 occurred 
6: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length
7: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length

我应该提一下，当我用B样条bs（A2，df = i）* bs（A3，df = j）替换poly（A2，i）* poly（A3，j）时，代码工作正常。最后，当我检查optPmodel并意识到i = 5，j = 5时，我在交互式会话中做了以下事情：

optPmodel1<-glm(A16~poly(A2,5)*poly(A3,5),family=binomial,data=credit)
nlogitPredP<-predict(optPmodel1, newPts, type="response")

然后它工作得很好。任何见解都将不胜感激。

在R中使用poly（x，i）

0 个答案: