在R中使用poly(x,i)

时间:2016-04-17 02:43:30

标签: r logistic-regression polynomials

为了找到基于AIC的最佳逻辑模型,我在UCI存储库(here)上运行德国信用数据循环,如下所示: 1)我将数据保存在名为" credit"的数据框中。标题从A1到A16(A16作为响应,只有A2和A3作为自变量)。 2)运行以下代码:

credit <-
read.csv("http://archive.ics.uci.edu/ml/machine-learning-databases/credit-    screening/crx.data", na.strings="?",col.names=paste0('A',1:16))

#Remove rows with <NA> 
credit <- credit[!is.na(credit$A2) & !is.na(credit$A2)&!is.na(credit$A16),]
print(head(credit))
print(tail(credit))

k<-5
pAIC<-c()
pd<-c()
library(splines)
for (i in 1:k){
  for (j in 1:k){
      if(i==1 & j==1){
        optPmodel<-pModel<-
           glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
        bestPAIC<-extractAIC(pModel)[2]
        pd<-c(pd,extractAIC(pModel)[1])
        pAIC<-c(pAIC,bestPAIC)
      }else{
      pModel<-
           glm(A16 ~poly(A2,i)*poly(A3,j),family=binomial, data=credit)
        if((tmp<-extractAIC(pModel)[2]) < bestPAIC){
         bestPAIC<-tmp
         optPmodel<-pModel
        }
       pd<-c(pd,extractAIC(pModel)[1])
       pAIC<-c(pAIC,tmp)
      }
     }
}

newA2<-seq(mA2<-floor(min(credit$A2)),MA2<-ceiling(max(credit$A2)),by=1)
newA3<-seq(mA3<-floor(min(credit$A3)),MA3<-ceiling(max(credit$A3)),by=1/2)

ii<-c()
jj<-c()
for (i in newA2){
   for (j in newA3){
       ii<-c(ii,i)
       jj<-c(jj,j)
   }
}

newPts<-data.frame(A2=ii, A3=jj) #add rows
# build the predictor for all the new points
####### This is where the code crashes: 
nlogitPredP<-predict(optPmodel, newPts, type="response")

第一个双重for循环在i和j上以1:k运行,并且对于每个构建逻辑模型A16~poly(A2,i)* poly(A3,j)并且如果它具有更好的则将其设置为optPmodel AIC比当前的optPmodel。当我想在预测中使用它时,我得到以下错误:

"Error: variables ‘poly(A2, i)’, ‘poly(A3, j)’ were specified with different              types from the fit
In addition: Warning messages:
1: glm.fit: fitted probabilities numerically 0 or 1 occurred 
2: glm.fit: fitted probabilities numerically 0 or 1 occurred 
3: glm.fit: fitted probabilities numerically 0 or 1 occurred 
4: glm.fit: fitted probabilities numerically 0 or 1 occurred 
5: glm.fit: fitted probabilities numerically 0 or 1 occurred 
6: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length
7: In Z/rep(sqrt(norm2[-1L]), each = length(x)) :
  longer object length is not a multiple of shorter object length

我应该提一下,当我用B样条bs(A2,df = i)* bs(A3,df = j)替换poly(A2,i)* poly(A3,j)时,代码工作正常。最后,当我检查optPmodel并意识到i = 5,j = 5时,我在交互式会话中做了以下事情:

optPmodel1<-glm(A16~poly(A2,5)*poly(A3,5),family=binomial,data=credit)
nlogitPredP<-predict(optPmodel1, newPts, type="response")

然后它工作得很好。任何见解都将不胜感激。

0 个答案:

没有答案