Question

我发现布朗语料库不同类型的某些单词的频率分布。

我的代码：

import nltk
from nltk.corpus import brown

cfd = nltk.ConditionalFreqDist(
      (genre, word)
      for genre in brown.categories()
      for word in brown.words(categories = genre))

genres = ['news', 'religion', 'hobbies', 'science_fiction', 'romance', 'humor']
modals = ['can', 'could', 'may', 'might', 'must', 'will']

cfd.tabulate(conditions = genres, samples = modals)

上述代码的输出：

                 can could  may might must will 
           news   93   86   66   38   50    389 
       religion   82   59   78   12   54     71 
        hobbies  268   58  131   22   83    264 
science_fiction   16   49    4   12    8     16 
        romance   74  193   11   51   45     43 
          humor   16   30    8    8    9     13

但是当我在上面代码的最后一行用'sample'替换'samples'时。它为语料库中的每个单词提供FreqDist。

我不知道'sample'和'samples'之间的区别？

谢谢。

Answer 1

cfd.tabulate()只是忽略在其实现中未引用的任何关键字参数。这就是sample=models仍然为FreqDist生成一个完整表的原因。如果你完全放弃它，效果应该是一样的。

此行为不是特定于NLTK的，但适用于任何接受任意参数列表的Python函数/方法。我建议阅读the Python Tutorial部分，我发现它很清楚。

python nltk ConditionalFreqDist中'sample'和'samples'关键字之间的区别

1 个答案: