我正在研究http://r-statistics.co/Logistic-Regression-With-R.html的一个例子。我对smbinning代码有问题。我试图通过使用smbinning来获取信息价值。
library(smbinning)
# segregate continuous and factor variables
factor_vars <- c ("WORKCLASS", "EDUCATION", "MARITALSTATUS", "OCCUPATION", "RELATIONSHIP", "RACE", "SEX", "NATIVECOUNTRY")
continuous_vars <- c("AGE", "FNLWGT","EDUCATIONNUM", "HOURSPERWEEK", "CAPITALGAIN", "CAPITALLOSS")
iv_df <- data.frame(VARS=c(factor_vars, continuous_vars), IV=numeric(14)) # init for IV results
# compute IV for categoricals
for(factor_var in factor_vars){
smb <- smbinning.factor(trainingData, y="ABOVE50K", x=factor_var) # WOE table
if(class(smb) != "character"){ # heck if some error occured
iv_df[iv_df$VARS == factor_var, "IV"] <- smb$iv
}
}
这是给出的代码。我无法理解检查同类课程背后的原因。我对smbinning的一般理解也不是那么好。
for(vars in factor_vars){
smb <- smbinning.factor(trainingData, y = "ABOVE50K", x = vars )
iv_df[iv_df$VARS == vars, "IV"] <- smb["iv"]
}
当我运行此代码时,我得到一些值NA值。所以显然需要课堂检查,但为什么呢?
非常感谢。
答案 0 :(得分:0)
按照这封信的例子,你的问题如下:
smb <- smbinning.factor(trainingData, y="ABOVE50K", x="EDUCATION")
然后smb
,则1&#34;太多类别&#34;
str(trainingData)
显示:$ EDUCATION:因子w / 16级...
maxcat - 指定最大类别数。默认值为10. x的名称 一定不能有点。
smb <- smbinning.factor(trainingData, y="ABOVE50K", x=factor_var, maxcat=16)