nltk taggedcorpusreader错误

时间:2018-03-29 21:48:13

标签: python nlp nltk

我在NLTK中构建TaggedCorpusReader(使用ipython笔记本)从ANC读取一些POS标记文件。 (http://www.anc.org/)我想从标记语料库中获取所有形容词。这就是我的尝试:

anc = nltk.corpus.reader.tagged.TaggedCorpusReader(anc_root, r".*\.txt", sep='_')
tagged_words = anc.tagged_words()
anc_adj = {word.lower() for word, pos in tagged_words if pos =='JJ'}

所有函数(tagged_words(),words(),sents()等)都可以正常工作。但是当我尝试进行集合理解时,我得到以下断言错误:

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-70-4ba2a8ab817a> in <module>()
      2 tagged_words = anc.tagged_words()
      3 print(tagged_words)
----> 4 anc_adj = {word.lower() for word, pos in tagged_words if pos =='JJ'}

<ipython-input-70-4ba2a8ab817a> in <setcomp>(.0)
      2 tagged_words = anc.tagged_words()
      3 print(tagged_words)
----> 4 anc_adj = {word.lower() for word, pos in tagged_words if pos =='JJ'}

C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py in iterate_from(self, start_tok)
    400 
    401             # Get everything we can from this piece.
--> 402             for tok in piece.iterate_from(max(0, start_tok-offset)):
    403                 yield tok
    404 

C:\Program Files\Anaconda3\lib\site-packages\nltk\corpus\reader\util.py in iterate_from(self, start_tok)
    299                 self.read_block.__name__)
    300             num_toks = len(tokens)
--> 301             new_filepos = self._stream.tell()
    302             assert new_filepos > filepos, (
    303                 'block reader %s() should consume at least 1 byte (filepos=%d)' %

C:\Program Files\Anaconda3\lib\site-packages\nltk\data.py in tell(self)
   1364             check1 = self._incr_decode(self.stream.read(50))[0]
   1365             check2 = ''.join(self.linebuffer)
-> 1366             assert check1.startswith(check2) or check2.startswith(check1)
   1367 
   1368         # Return to our original filepos (so we don't have to throw

AssertionError:

我不知道这意味着什么!有人可以帮我理解这里的问题是什么吗?对布朗语料库进行设置理解很好......发生了什么事?

0 个答案:

没有答案