为什么预测没有达到预期的结果?

时间:2017-12-05 17:27:25

标签: r naivebayes

data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
                                "weekday", "weekday", "weekday", "weekday"),
                   vehicle = c("car", "car", "car", "car",
                               "bus", "bus", "bus", "bus"))

library(naivebayes)

model <- naive_bayes(vehicle ~ day_type, data = data)

predict(model, data.frame(day_type = "weekend"))
    [1] bus
Levels: bus car

期待的答案应该在这里开车,但我得到公共汽车作为答案。请帮助识别错误。

1 个答案:

答案 0 :(得分:3)

这有助于您了解问题:

data <- data.frame(day_type = c("weekend", "weekend", "weekend","weekend",
                                "weekday", "weekday", "weekday", "weekday"),
                   vehicle = c("car", "car", "car", "car",
                               "bus", "bus", "bus", "bus"))

library(naivebayes)

model <- naive_bayes(vehicle ~ day_type, data = data)

dt_test1 = data.frame(day_type = "weekend")
dt_test2 = data.frame(day_type = "weekday")
dt_test3 = data.frame(day_type = c("weekend","weekday"))

predict(model, newdata = dt_test1)

# [1] bus
# Levels: bus car

predict(model, newdata = dt_test2)

# [1] bus
# Levels: bus car

predict(model, newdata = dt_test3)

# [1] car bus
# Levels: bus car

测试数据集1和2具有1个级别,并且它们分别将值1分配给“周末”和“工作日”。然后模型理解值1和2(基于原始数据集data中的内容)并且不关心标签(工作日/周末)。 但是,在测试数据集3中,您有两个标签,它们会得到正确的值(wwekend / weekday - &gt; 1/2)。

作为极端情况,请检查:

dt_test4 = data.frame(day_type = c("January","February"))

predict(model, newdata = dt_test4)

# [1] car bus
# Levels: bus car

你仍然会得到预测!因为模型甚至不理解的那些值被编码为1和2。

因此,正如@Aaron建议的那样,确保确保因子水平匹配,或使用字符变量而不是因子变量。