Logistic回归分析中计算信息值的误差

时间:2018-06-11 18:39:37

标签: r logistic-regression data-science

我正在尝试计算信息值,但我遇到了一些问题。

# Calculate Information Values

factor_vars = c(ITEM_CATEGORY_DESCR,ITEM_DESCR,PRODUCT_SUB_LINE_DESCR, 
                MAJOR_CATEGORY_DESCR,
     CUSTOMER_NAME,CUST_BRANCH_DESCR,PROGRAM_LEVEL_DESCR,CUST_STATE_KEY,
              CUST_REGION_DESCR, CUST_CITY)  # get all categorical variables

all_iv = data.frame(VARS=factor_vars, IV=numeric(length(factor_vars)), 
STRENGTH=character(length(factor_vars)), stringsAsFactors = FALSE)  
# init output dataframe

 for (factor_var in factor_vars){
  all_iv[all_iv$VARS == factor_var, "IV"] <- 
     InformationValue::IV(X=new_df[, factor_var], Y=new_df$product_category)
  all_iv[all_iv$VARS == factor_var, "STRENGTH"] <- 
      attr(InformationValue::IV(X=new_df[, factor_var], 
        Y=new_df$product_category), "howgood")
}

错误消息显示为:

Error in `[.data.frame`(new_df, , factor_var) : 
  undefined columns selected
In addition: Warning messages:
 1: In `[<-.factor`(`*tmp*`, which(Y == valueOfGood), value = 1) :
  invalid factor level, NA generated
 2: In `[<-.factor`(`*tmp*`, which(!(Y == "1")), value = 0) :
  invalid factor level, NA generated

我该如何解决这个问题?

0 个答案:

没有答案