我目前正在学习如何进行朴素贝叶斯建模,并尝试将其应用于python和R中,但是,通过一个玩具示例,我正在努力在python中重新创建与在R或R中进行计算所得的相同数字用手。
帮助弄清楚为什么我得到不同的数字将不胜感激!
玩具数据为
Class (y) A A A A B B B B B B
var x1 2 1 1 0 0 1 1 0 0 0
var x2 0 0 1 0 0 1 1 1 1 1
也就是说我的因变量y具有2级A和B,解释变量x1具有3级0,1,2和x2具有2级0和1。
我当前的目标是使用多项朴素贝叶斯模型预测值x1 = 1和x2 = 1的新数据点的类概率。
我当前的python代码是:
import pandas as pd
from sklearn.naive_bayes import MultinomialNB
dat = pd.DataFrame({
"class" : ["A", "A","A","A", "B","B","B","B","B","B"],
"x1" : [2,1,1,0,0,1,1,0,0,0],
"x2" : [0,0,1,0,1,0,1,1,1,1]
})
mnb = MultinomialNB(alpha= 0)
x = mnb.fit(dat[["x1","x2"]], dat["class"])
x.predict_proba( pd.DataFrame( [[1,1]] , columns=["x1","x2"]) )
## Out[160]: array([[ 0.34325744, 0.65674256]])
但是我尝试在R中得到相同的结果:
library(dplyr)
library(e1071)
dat = data_frame(
"class" = c("A", "A","A","A", "B","B","B","B","B","B"),
"x1" = c(2,1,1,0,0,1,1,0,0,0),
"x2" = c(0,0,1,0,1,0,1,1,1,1)
)
model <- naiveBayes(class ~ . , data = table(dat) )
predict(
model,
newdata = data_frame(
x1 = factor(1, levels = c(0,1,2)) ,
x2 = factor(1, levels = c(0,1))),
type = "raw"
)
## A B
## [1,] 0.2307692 0.7692308
然后我得到以下信息:
模型是
从数据中,我们得到以下概率估计
因此将数字插入
哪个匹配R的结果。所以我再次对python示例中做错的事情感到困惑。任何帮助,将不胜感激。