我写了这个小脚本来查找语料库中10个最常用单词的上下文。但它不起作用,我不知道我做错了什么.tien_frequentste(mijn_corpus)定义适用于它自己。
tienfrequentste = tien_frequentste(mijncorpus)
def context (corpus, most_freq):
for category in corpus.categories():
print "Context voor" , category, ":"
for word in most_freq:
print nltk.Text(corpus.words(categories=category)).concordance(word)
更新:
我在追溯上收到错误信息
对于context(corpus, most_freq)
,
对于category in corpus.categories()
,
对于self.init()
和in_init
。
还有AttributeError:'NoneType' object has no attribute 'group'
不知道这些错误意味着什么......
Traceback (most recent call last):
File "/Users/...document.py", line 92, in <module> context (mijn_corpus, tienfrequentste)
File "/Users/...document.py", line 87, in context for category in corpus.categories(): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk.corpus.reader.api.py, line 317, in categories self.init().
File "/Users/...document.py", line 87, in context for category in corpus.categories(): File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/site-packages/nltk.corpus.reader.api.py, line 289, in_init category = re.match(self._pattern, file id).group(1)
attributeError: 'Nonetype' object has no attribute "group"
答案 0 :(得分:0)
您的语料库是否有类别且most_freq
是字符串列表?以下示例有效:
from nltk.corpus import reuters
for category in reuters.categories():
print "context voor", category, " : "
for word in ["get", "have", "do"]:
print nltk.Text(reuters.words(categories=category)).concordance(word)
答案 1 :(得分:0)
错误来自将语料库文件分配给类别的正则表达式。它在与正则表达式模式不匹配的文件名上磕磕绊绊。如果您使用带有类别的标准NLTK语料库,则必须在语料库目录中放置一个额外的文件。如果您使用的是自己的语料库,则说明配置错误。
顺便提一下,concordance()
打印其输出并返回None
。如果您使用print
,则会看到一大堆None
值。