NLTK和Scikit Naive Bayes之间的区别

时间:2019-03-14 03:10:02

标签: python scikit-learn nltk

我可以知道NLTK的朴素贝叶斯的本质是什么吗?是伯努利,多项式,高斯还是任何其他变体?我通读了文档,但看起来太笼统了。

我知道scikit有4个版本的朴素贝叶斯,其中只有两个适合文本处理。

在进行文本处理时,我发现NLTK朴素贝叶斯和scikit两者之间存在显着差异。

1 个答案:

答案 0 :(得分:1)

NLTK朴素贝叶斯是多项式(典型地带有分类),其线索是高斯朴素贝叶斯通常用于连续数据(不是文本分类中的典型值)。

NLTK朴素贝叶斯的官方文档可以在这里找到:https://www.nltk.org/_modules/nltk/classify/naivebayes.html

关键文本示例-

A classifier based on the Naive Bayes algorithm.  In order to find the
probability for a label, this algorithm first uses the Bayes rule to
express P(label|features) in terms of P(label) and P(features|label):

|                       P(label) * P(features|label)
|  P(label|features) = ------------------------------
|                              P(features)

The algorithm then makes the 'naive' assumption that all features are
independent, given the label:

|                       P(label) * P(f1|label) * ... * P(fn|label)
|  P(label|features) = --------------------------------------------
|                                         P(features)

Rather than computing P(features) explicitly, the algorithm just
calculates the numerator for each label, and normalizes them so they
sum to one:

|                       P(label) * P(f1|label) * ... * P(fn|label)
|  P(label|features) = --------------------------------------------
|                        SUM[l]( P(l) * P(f1|l) * ... * P(fn|l) )