我在尝试执行某些python代码时收到一个我不明白的错误。我试图通过优秀的NLTK教科书学习使用自然语言工具包。在尝试以下代码时(对我自己的数据进行图2.1的修改),我收到了以下错误。
我运行的代码:
import os, re, csv, string, operator
import nltk
from nltk.corpus import PlaintextCorpusReader
dir = '/Dropbox/hearings'
corpus_root = dir
text = PlaintextCorpusReader(corpus_root, ".*")
cfd = nltk.ConditionalFreqDist(
(target, fileid[:3])
for fileid in text.fileids()
for w in text.words(fileid)
for target in ['budget','appropriat']
if w.lower().startswith(target))
cfd.plot()
我收到错误(完全追溯):
In [6]: ---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
<ipython-input-6-abc9ff8cb2f1> in <module>()
----> 1 execfile(r'/Dropbox/hearings/hearings_ingest.py') # PYTHON-MODE
/Dropbox/hearings/hearings_ingest.py in <module>()
14 cfd = nltk.ConditionalFreqDist(
15 (target, fileid[:3])
---> 16 for fileid in text.fileids()
17 for w in text.words(fileid)
18 for target in ['budget','appropriat']
/Users/ian/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/nltk/probability.pyc in __init__(self, cond_samples)
1727 defaultdict.__init__(self, FreqDist)
1728 if cond_samples:
-> 1729 for (cond, sample) in cond_samples:
1730 self[cond].inc(sample)
1731
/Dropbox/hearings/hearings_ingest.py in <genexpr>((fileid,))
15 (target, fileid[:3])
16 for fileid in text.fileids()
---> 17 for w in text.words(fileid)
18 for target in ['budget','appropriat']
19 if w.lower().startswith(target))
/Users/ian/Library/Enthought/Canopy_64bit/User/lib/python2.7/site-packages/nltk/corpus/reader/util.pyc in iterate_from(self, start_tok)
341
342 # If we reach this point, then we should know our length.
--> 343 assert self._len is not None
344
345 # Use concat for these, so we can use a ConcatenatedCorpusView
AssertionError:
In [7]:
我包含新的IPython行以显示这是完整的错误。 (在阅读其他问题时,我看到“AssertionError:”后面经常会有更多信息。在我的错误中,它是空白的。)
我很感激在理解我的代码中的错误方面有任何帮助!谢谢!
答案 0 :(得分:1)
我可以通过创建空文件foo
,然后调用text.words('foo')
来重现错误:
In [18]: !touch 'foo'
In [19]: text = corpus.PlaintextCorpusReader('.', "foo")
In [20]: text.words('foo')
AssertionError:
所以为了避免空文件,你可以这样做:
cfd = nltk.ConditionalFreqDist(
(target, fileid[:3])
for fileid in text.fileids()
if os.path.getsize(fileid) > 0 # check the filesize is not 0
for w in text.words(fileid)
for target in ['budget', 'appropriat']
if w.lower().startswith(target))