我正在尝试在简单的数据集上执行朴素贝叶斯分类器。我拥有的三个变量是weight
(连续),BP
(连续)和disease
(二分)。
当我为朴素贝叶斯编写命令时,一些结果给了我一个大于一的概率。我也通过' e1071'和' klaR'。
请参阅下面的代码:
> install.packages("e1071")
> library(e1071)
> mydata$disease<-as.factor(mydata$disease)
> classifier<- naiveBayes(disease ~ weight + BP, mydata, laplace = 0, subset, na.action = na.pass)
> Please see my results below,
> A-priori probabilities:
> Y
> 0 1
> 0.47 0.53
> Conditional probabilities:
> weight
> Y [,1] [,2]
> 0 69.10638 27.22869
> 1 131.22642 39.47377
> BP
> Y [,1] [,2]
> 0 44.78723 21.73350
> 1 35.81132 13.55623
如上所示,其中一个概率是44.78723。那是对的吗?我也尝试了klaR,它给了我非常相似的结果。帮助
答案 0 :(得分:0)
升级评论:
输出给出了类变量的每个级别的正态分布的参数(平均值和标准差)。来自?naiveBayes
帮助:
For each numeric variable, a table giving, for each target class, mean and
standard deviation of the (sub-)variable)
使用iris
数据集
library(e1071)
# load iris dataset and set some values to missing
data(iris)
iris$Sepal.Length[1] <- NA
iris$Petal.Width[2] <- NA
iris$Species[3] <- NA
# run naive Bayes model
(m <- naiveBayes(Species ~ Sepal.Length + Petal.Width , data = iris, na.action=na.omit))
这会产生输出
# Naive Bayes Classifier for Discrete Predictors
#
# Call:
# naiveBayes.default(x = X, y = Y, laplace = laplace)
#
# A-priori probabilities:
# Y
# setosa versicolor virginica
# 0.3197279 0.3401361 0.3401361
#
# Conditional probabilities:
# Sepal.Length
# Y [,1] [,2]
# setosa 5.012766 0.3603241
# versicolor 5.936000 0.5161711
# virginica 6.588000 0.6358796
#
# Petal.Width
# Y [,1] [,2]
# setosa 0.2489362 0.1080908
# versicolor 1.3260000 0.1977527
# virginica 2.0260000 0.2746501
检查表格是否给出了平均值和st。每个Species
aggregate(cbind(Sepal.Length, Petal.Width) ~ Species, data=iris,
function(i) c(mean(i), sd(i)))
# Species Sepal.Length.1 Sepal.Length.2 Petal.Width.1 Petal.Width.2
# 1 setosa 5.0127660 0.3603241 0.2489362 0.1080908
# 2 versicolor 5.9360000 0.5161711 1.3260000 0.1977527
# 3 virginica 6.5880000 0.6358796 2.0260000 0.2746501