Logistic回归:预测时会在数据中显示新的水平

时间:2019-02-04 06:47:03

标签: r logistic-regression

前面已经在线性回归中对此进行了说明,但对逻辑回归则没有进行说明。而且我没有任何NA列,因为我正在用“ MissForest”输入数据,因此我的情况如下:

  1. 我们有19个调查变量级别。
  2. 虽然分为测试和培训,但很少有几个级别的培训会采用0值。
  3. 因此,在执行Logistic回归时,由于它在训练数据集中没有任何价值,因此会自动删除这些级别。

我该如何处理?

LR的功能如下

LR <- function(df, churnCol){
  # browser();
  CM <- df
  CM$Churn <- churnCol
  str(CM)
  names(CM)
  detach()
  attach(CM)
  CM_LOGIT = subset(CM, select = c(SalonID,Sex,Spl_instruction,category,survey,member_check,Studio_Client,`mean(Total)`,No_Of_Visits,Churn))
  CM_LOGIT <- as.data.frame(CM_LOGIT,stringsAsFactors = T)

  str(CM_LOGIT)

  sapply(CM_LOGIT,function(x) sum(is.na(x)))

  df.CM_LOGIT <- missForest(CM_LOGIT)
  summary(df.CM_LOGIT$ximp)
  df.CM_LOGIT <- df.CM_LOGIT$ximp

  setnames(df.CM_LOGIT, "mean(Total)", "mean_total")
  summary(df.CM_LOGIT)
  sapply(df.CM_LOGIT,function(x) sum(is.na(x)))

  intrain<- createDataPartition(df.CM_LOGIT$Churn,p=0.7,list=FALSE)
  set.seed(2019)
  training<- df.CM_LOGIT[intrain,]
  testing<- df.CM_LOGIT[-intrain,]

  dim(training); dim(testing);

  LogModel <- glm(Churn ~ .,family=binomial(link="logit"),data=training)
  print(summary(LogModel))

  print(anova(LogModel, test="Chisq"))

  # testing$Churn <- as.character(testing$Churn)
  # testing$Churn[testing$Churn=="No"] <- "0"
  # testing$Churn[testing$Churn=="Yes"] <- "1"
  # testing$Churn <- as.factor(testing$Churn) 
  fitted.results <- predict.glm(object =  LogModel,newdata =  testing, type='response')
  print("Confusion Matrix for Logistic Regression"); table(testing$Churn, fitted.results > 0.5)
  tab.LOGIT <- table(testing$Churn, fitted.results > 0.5)
  print(tab.LOGIT)
  accuracy.LOGIT<-sum(diag(tab.LOGIT))/sum(tab.LOGIT)
  print(accuracy.LOGIT);

  #ROCR Curve
  library(ROCR)
  ROCRpred <- prediction(fitted.results, testing$Churn)
  ROCRperf <- performance(ROCRpred, 'tpr','fpr')
  print(plot(ROCRperf, colorize = TRUE, text.adj = c(-0.2,1.7)));
  print(InformationValue::AUROC(testing$Churn,fitted.results));

  print(exp(cbind(OR=coef(LogModel), confint(LogModel))));
}

调用LR函数

LR(CHURN_MODELLING_DATA,CHURN_MODELLING_DATA$Churn30)

调用该函数后,将显示错误消息

  

model.frame.default(条款,newdata,na.action = na.action,xlev = object $ xlevels)中的错误:因子调查有了新的水平,只需拨打

但是,当我们调试并查找数据时,它可以用来预测电平。 浏览[2]>表(training $ survey)

                   Banners                      Corporate                         Events 
                        30                              5                             32 
          EXISTING CLIENTS         Gift Coupon - External         Gift Coupon - Internal 
                      4139                             55                             41 
                 Hoardings                      Just Dial                    Just Dial N 
                       244                              0                              1 
                News Paper No Parking Board / Way Signage                         Others 
                       147                              0                           1259 
                 Pamphlets                     Radio Adds                      Reference 
                        10                              1                           4877 
                   Signage                            SMS                        TV-Adds 
                      2403                              0                              1 
                  Web Site 
                        18 

浏览[2]>表格(training $ survey)

                   Banners                      Corporate                         Events 
                        30                              5                             32 
          EXISTING CLIENTS         Gift Coupon - External         Gift Coupon - Internal 
                      4139                             55                             41 
                 Hoardings                      Just Dial                    Just Dial N 
                       244                              0                              1 
                News Paper No Parking Board / Way Signage                         Others 
                       147                              0                           1259 
                 Pamphlets                     Radio Adds                      Reference 
                        10                              1                           4877 
                   Signage                            SMS                        TV-Adds 
                      2403                              0                              1 
                  Web Site 
                        18 

用于预测的数据可以!我仍然面临着新水平的问题

如果不将上面的代码放入函数中,它必须在运行时没有任何错误。

0 个答案:

没有答案