R,e1071库中的朴素贝叶斯:拟合模型给出了每个记录的先验概率作为预测

时间:2015-09-25 11:49:03

标签: r naivebayes

我使用来自e1071库的Naive Bayes。我有以下名为nb0.csv

的玩具数据集
N_INQUIRIES_BIN,TARGET
1,0
2,1
2,0
1,0
1,0
1,0
1,1 

然后我使用以下代码行

library(e1071)
data = read.csv('d:/nb0.csv')
model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])

当我输入model时,我发现该模型已经过某种程度的培训

> model    
Naive Bayes Classifier for Discrete Predictors

Call:
naiveBayes.default(x = as.factor(data[, "N_INQUIRIES_BIN"]),
    y = data[, "TARGET"])

A-priori probabilities:
data[, "TARGET"]
        0         1
0.7142857 0.2857143

Conditional probabilities:
                x
data[, "TARGET"]   1   2
               0 0.8 0.2
               1 0.5 0.5

但是,当我对训练数据做出预测时,我会得到先验概率作为所有记录的预测

> predict(model, as.factor(data[, 'N_INQUIRIES_BIN']), type='raw')
             0         1
[1,] 0.7142857 0.2857143
[2,] 0.7142857 0.2857143
[3,] 0.7142857 0.2857143
[4,] 0.7142857 0.2857143
[5,] 0.7142857 0.2857143
[6,] 0.7142857 0.2857143
[7,] 0.7142857 0.2857143

这是一个实施错误还是我遗漏了一些明显的东西?

P.S。一切都适用于example

正确答案

代码

library(e1071)
data = read.csv('d:/nb0.csv')

data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)

model <- naiveBayes(TARGET ~ ., data)
predict(model, data, type='raw')

导致我想要的

1 个答案:

答案 0 :(得分:2)

这个评论太长了,所以我发帖作为答案。我看到可以切换的两三件事情:

首先:我建议在模型之外调用as.factor(),如下所示:

data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)

第二:我不确定这是否是你想要的,但我在你的电话中看不到一个公式(注意你在那里发布的例子中总是有一个公式),注意这个之间的区别:

model <- naiveBayes(as.factor(data[, 'N_INQUIRIES_BIN']), data[, 'TARGET'])

和此:

#Here I can't claim this is the model you are looking for, but for illustration purposes:
model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)

请注意,除了之前调用as.factor()之外,我还切换了数据调用,因为这在尝试方法时引发了错误:

  

naiveBayes.formula中的错误(N_INQUIRIES_BIN~。,data = data [,2]):     naiveBayes公式接口仅处理数据帧或数组

按名称引用时出现相同的错误:

  

naiveBayes.formula中的错误(N_INQUIRIES_BIN~。,data = data [,&#34; TARGET&#34;]):     naiveBayes公式接口仅处理数据帧或数组

然而,此替代模型输出以下内容:

model <- naiveBayes(N_INQUIRIES_BIN ~ ., data = data)
model
#
#Naive Bayes Classifier for Discrete Predictors
#
#Call:
#naiveBayes.default(x = X, y = Y, laplace = laplace)
#
#A-priori probabilities:
#Y
#        1         2 
#0.7142857 0.2857143 
#
#Conditional probabilities:
#   TARGET
#Y   [,1]      [,2]
#  1  0.2 0.4472136
#  2  0.5 0.7071068

再次注意,使用此函数调用计算的条件和A先验概率与您的不同。

最后,预测(再次,按照帮助文件中的示例):

#Here, all of the dataset is taken into account
predict(model, data, type='raw')
#             1         2
#[1,] 0.8211908 0.1788092
#[2,] 0.5061087 0.4938913
#[3,] 0.8211908 0.1788092
#[4,] 0.8211908 0.1788092
#[5,] 0.8211908 0.1788092
#[6,] 0.8211908 0.1788092
#[7,] 0.5061087 0.4938913

为了完整性&#39;为了发布的主题,模型中的公式与OP想要的不同,这是实际的调用:

#Keep the as.factor call outside of the model
data$N_INQUIRIES_BIN <- as.factor(data$N_INQUIRIES_BIN)
#explicitly state the formula in the naivebayes
#note that the especified column is TARGET and not N_INQUIRIES_BIN
model <- naiveBayes(TARGET ~ ., data)
#predict the model, with all the dataset
predict(model, data, type='raw')
#Yields the following:
#       0   1
#[1,] 0.8 0.2
#[2,] 0.5 0.5
#[3,] 0.5 0.5
#[4,] 0.8 0.2
#[5,] 0.8 0.2
#[6,] 0.8 0.2
#[7,] 0.8 0.2