我正在尝试确保在我的树对象和我的预测测试集中完全表示所有类型因素的特征(根据所有可能的因子级别)。
for (j in 1:length(predictors)){
if (is.factor(Test[,j])){
ct [[names(predictors)[j]]] <- union(ct$xlevels[[names(predictors)[j]]], levels(Test[,c(names(predictors)[j])]))
}
}
然而,对于对象ct(来自包方的ctree),我似乎无法理解如何访问功能的因子级别,因为我收到错误
Error in ct$xlevels : $ operator not defined for this S4 class
答案 0 :(得分:0)
我无数次遇到这个问题,今天我想出了一个小技巧,该小技巧不应该用来解决各个级别的因素差异。
只需在整个数据集(火车+测试)上建立模型,为测试观测值赋予零权重即可。这样,ctree模型将不会降低因子水平。
a <- ctree(Y ~ ., DF[train.IDs,]) %>% predict(newdata = DF) # Would trigger error if the data passed to predict would not match the train data levels
b <- ctree(Y ~ ., weights = as.numeric((1:nrow(DF) %in% train.IDs)), data = DF) %>% predict(newdata = DF) # passing the IDs as 0-1 in the weights instead of subsetting the data solves it
mean(a == b) # test that predictions are equals, should be 1
告诉我它是否按预期工作!