Question

我正在使用NLTK使用拉普拉斯估计器训练一个二元模型。 NgramModel的构造函数是：

def __init__(self, n, train, pad_left=True, pad_right=False,
             estimator=None, *estimator_args, **estimator_kwargs):

经过一番研究，我发现有效的语法如下：

bigram_model = NgramModel(2, my_corpus, True, False, lambda f, b:LaplaceProbDist(f))

虽然它似乎工作正常，但我对最后两个论点感到困惑。主要是，为什么“估计器”参数是lambda函数以及如何与LaplaceProbDist交互？

Answer 1

目前，您可以使用lambda函数从分布中返回Freqdist，例如

from nltk.model import NgramModel
from nltk.corpus import brown
from nltk.probability import LaplaceProbDist

est = lambda fdist: LaplaceProbDist(fdist)

corpus = brown.words(categories='news')[:100]
lm = NgramModel(3, corpus, estimator=est)


print lm
print (corpus[8], corpus[9], corpus[12] )
print (lm.prob(corpus[12], [corpus[8], corpus[9]]) )
print

[OUT]：

<NgramModel with 100 3-grams>
(u'investigation', u'of', u'primary')
0.0186667723526

但请注意，NLTK中包含LanguageModel对象的model包是“构建不足”，因此当稳定版本出现时，上述代码可能无效。

要及时了解与model包相关的问题，请定期检查这些问题：

#792
#800

如何将估算器传递给NLTK的NgramModel？

1 个答案: