data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
"weekday", "weekday", "weekday", "weekday"),
vehicle = c("car", "car", "car", "car",
"bus", "bus", "bus", "bus"))
library(naivebayes)
model <- naive_bayes(vehicle ~ day_type, data = data)
predict(model, data.frame(day_type = "weekend"))
[1] bus
Levels: bus car
期待的答案应该在这里开车,但我得到公共汽车作为答案。请帮助识别错误。
答案 0 :(得分:3)
这有助于您了解问题:
data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
"weekday", "weekday", "weekday", "weekday"),
vehicle = c("car", "car", "car", "car",
"bus", "bus", "bus", "bus"))
library(naivebayes)
model <- naive_bayes(vehicle ~ day_type, data = data)
dt_test1 = data.frame(day_type = "weekend")
dt_test2 = data.frame(day_type = "weekday")
dt_test3 = data.frame(day_type = c("weekend","weekday"))
predict(model, newdata = dt_test1)
# [1] bus
# Levels: bus car
predict(model, newdata = dt_test2)
# [1] bus
# Levels: bus car
predict(model, newdata = dt_test3)
# [1] car bus
# Levels: bus car
测试数据集1和2具有1个级别,并且它们分别将值1分配给“周末”和“工作日”。然后模型理解值1和2(基于原始数据集data
中的内容)并且不关心标签(工作日/周末)。
但是,在测试数据集3中,您有两个标签,它们会得到正确的值(wwekend / weekday - &gt; 1/2)。
作为极端情况,请检查:
dt_test4 = data.frame(day_type = c("January","February"))
predict(model, newdata = dt_test4)
# [1] car bus
# Levels: bus car
你仍然会得到预测!因为模型甚至不理解的那些值被编码为1和2。
因此,正如@Aaron建议的那样,确保确保因子水平匹配,或使用字符变量而不是因子变量。