如何在python中进行朴素贝叶斯建模(使用sklearn MultinomialNB)

时间:2018-07-12 14:31:07

标签: python scikit-learn naivebayes

我目前正在学习如何进行朴素贝叶斯建模,并尝试将其应用于python和R中,但是,通过一个玩具示例,我正在努力在python中重新创建与在R或R中进行计算所得的相同数字用手。

帮助弄清楚为什么我得到不同的数字将不胜感激!

玩具数据为

Class (y)   A A A A  B B B B B B
  var x1    2 1 1 0  0 1 1 0 0 0
  var x2    0 0 1 0  0 1 1 1 1 1  

也就是说我的因变量y具有2级A和B,解释变量x1具有3级0,1,2和x2具有2级0和1。

我当前的目标是使用多项朴素贝叶斯模型预测值x1 = 1和x2 = 1的新数据点的类概率。

我当前的python代码是:

import pandas as pd
from sklearn.naive_bayes import MultinomialNB

dat = pd.DataFrame({
    "class" : ["A", "A","A","A", "B","B","B","B","B","B"],
    "x1" : [2,1,1,0,0,1,1,0,0,0],
    "x2" : [0,0,1,0,1,0,1,1,1,1]
})

mnb = MultinomialNB(alpha= 0)
x = mnb.fit(dat[["x1","x2"]], dat["class"])
x.predict_proba(  pd.DataFrame( [[1,1]] , columns=["x1","x2"])   )

## Out[160]: array([[ 0.34325744,  0.65674256]])

但是我尝试在R中得到相同的结果:

library(dplyr)
library(e1071)    

dat = data_frame(
    "class" = c("A", "A","A","A", "B","B","B","B","B","B"),
    "x1" = c(2,1,1,0,0,1,1,0,0,0),
    "x2" = c(0,0,1,0,1,0,1,1,1,1)
)

model <- naiveBayes(class ~ . , data = table(dat) )

predict(
    model, 
    newdata = data_frame(
        x1 = factor(1, levels = c(0,1,2)) ,
        x2 = factor(1, levels = c(0,1))),  
    type = "raw"
)
##             A         B
## [1,] 0.2307692 0.7692308

然后我得到以下信息:

模型是

enter image description here

从数据中,我们得到以下概率估计

enter image description here

因此将数字插入

enter image description here

哪个匹配R的结果。所以我再次对python示例中做错的事情感到困惑。任何帮助,将不胜感激。

0 个答案:

没有答案