我从NLTK开始,我正在遵循NLTK书的说明。在第5章(N-Gram标记)中,可以找到以下代码:
>>> from nltk.corpus import brown
>>> brown_tagged_sents = brown.tagged_sents(categories='news')
>>> brown_sents = brown.sents(categories='news')
>>> unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
>>> unigram_tagger.tag(brown_sents[2007])
[('Various', 'JJ'), ('of', 'IN'), ('the', 'AT'), ('apartments', 'NNS'),
('are', 'BER'), ('of', 'IN'), ('the', 'AT'), ('terrace', 'NN'), ('type', 'NN'),
(',', ','), ('being', 'BEG'), ('on', 'IN'), ('the', 'AT'), ('ground', 'NN'),
('floor', 'NN'), ('so', 'QL'), ('that', 'CS'), ('entrance', 'NN'), ('is', 'BEZ'),
('direct', 'JJ'), ('.', '.')]
>>> unigram_tagger.evaluate(brown_tagged_sents)
0.9349006503968017
我正在努力做同样的事,但我想把整个布朗语料库用来训练unigram标记器。为此,我正在尝试:
brown_tagged_sents = brown.tagged_sents()
brown_sents = brown.sents()
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
unigram_tagger.tag(brown_sents)
unigram_tagger.evaluate(brown_tagged_sents)
但由于某种原因,我收到了错误:
Traceback (most recent call last):
File "/Users/missogra/PycharmProjects/try/POS-Tagger-nltk.py", line 9, in <module>
unigram_tagger.tag(brown_sents)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 63, in tag
tags.append(self.tag_one(tokens, i, tags))
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 83, in tag_one
tag = tagger.choose_tag(tokens, index, history)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/nltk/tag/sequential.py", line 142, in choose_tag
return self._context_to_tag.get(context)
TypeError: unhashable type: 'list'
Process finished with exit code 1
我非常感谢有关为何会发生这种情况的任何暗示。
pyhon版本3.5
提前谢谢。
答案 0 :(得分:2)
感谢Patrick Artner的回答,我设法解决了我的问题:
brown_tagged_sents = brown.tagged_sents()
brown_sents = brown.sents()
unigram_tagger = nltk.UnigramTagger(brown_tagged_sents)
data = [tuple(sent) for sent in brown_sents]
unigram_tagger.tag(data)
print(unigram_tagger.evaluate(brown_tagged_sents))