我有一个包含8个二元预测变量(是/否)和数值结果的数据库。我想找出哪种预测变量组合最适合预测我的结果,但是R的randomForest不喜欢二进制预测变量:我得到了负方差解释,并且在尝试使用“重要性”对预测变量进行评分时出错。
我的代码:
library(randomForest)
#binary predictors
print_size <- c(0,0,0,0,0,1,0)
mid_ridge <- c(1,1,0,0,1,0,0)
classification <- c(1,1,1,1,1,1,0)
ridge_thickness <- c(1,1,1,1,1,1,1)
delta_center_distance <- c(1,0,1,1,1,1,1)
double_loop_size <- c(0,0,0,0,0,0,1)
whorl_length <- c(0,0,0,0,0,0,1)
loop_angle <- c(0,0,0,1,0,0,1)
#numeric result
LR <- c(44,42,34,20,19,11,9)
pred <- cbind(print_size, mid_ridge, classification, ridge_thickness,
delta_center_distance, double_loop_size,
whorl_length, loop_angle, LR)
output.forest <- randomForest(LR ~ ., ntree=1000,data = pred, importance=TRUE)
print(importance(output.forest,type = 1))
结果:
Mean of squared residuals: 210.327
% Var explained: -18.57
错误
UseMethod(“ importance”)中的错误:没有适用的方法 “重要性”应用于“ c('standardGeneric')类的对象, 'genericFunction','function','OptionalFunction','PossibleMethod', 'optionalMethod')“