NLTK - UnigramTagger:TypeError:不可用类型:'list'

时间:2018-02-13 13:35:11

标签: python-3.x nltk

我从NLTK开始,我正在遵循NLTK书的说明。在第5章(N-Gram标记)中,可以找到以下代码:

>>> from nltk.corpus import brown
>>> brown_tagged_sents = brown.tagged_sents(categories='news')
>>> brown_sents = brown.sents(categories='news')
>>> unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
>>> unigram_tagger.tag(brown_sents[2007])
[('Various', 'JJ'), ('of', 'IN'), ('the', 'AT'), ('apartments', 'NNS'),
('are', 'BER'), ('of', 'IN'), ('the', 'AT'), ('terrace', 'NN'), ('type', 'NN'),
(',', ','), ('being', 'BEG'), ('on', 'IN'), ('the', 'AT'), ('ground', 'NN'),
('floor', 'NN'), ('so', 'QL'), ('that', 'CS'), ('entrance', 'NN'), ('is', 'BEZ'),
('direct', 'JJ'), ('.', '.')]
>>> unigram_tagger.evaluate(brown_tagged_sents)
0.9349006503968017

我正在努力做同样的事,但我想把整个布朗语料库用来训练unigram标记器。为此,我正在尝试:

brown_tagged_sents = brown.tagged_sents()
brown_sents = brown.sents()

unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.tag(brown_sents)
unigram_tagger.evaluate(brown_tagged_sents)

但由于某种原因,我收到了错误:

Traceback (most recent call last):
  File "/Users/missogra/PycharmProjects/try/POS-Tagger-nltk.py", line 9, in <module>
    unigram_tagger.tag(brown_sents)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 63, in tag
    tags.append(self.tag_one(tokens, i, tags))
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 83, in tag_one
    tag = tagger.choose_tag(tokens, index, history)
  File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 142, in choose_tag
    return self._context_to_tag.get(context)
TypeError: unhashable type: 'list'

Process finished with exit code 1

我非常感谢有关为何会发生这种情况的任何暗示。

pyhon版本3.5

提前谢谢。

1 个答案:

答案 0 :(得分:2)

感谢Patrick Artner的回答,我设法解决了我的问题:

brown_tagged_sents = brown.tagged_sents()
brown_sents = brown.sents()

unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
data = [tuple(sent) for sent in brown_sents]

unigram_tagger.tag(data)
print(unigram_tagger.evaluate(brown_tagged_sents))