NLTK中标记的POS错误

时间:2015-07-02 16:42:41

标签: python nltk

I just want to do tagging of POS tags but got some error.

文本=开放('新闻/ article.txt') T = Text.read() 打印 文本= nltk.word_tokenize(T); posTagged = nltk.pos_tag(文本) print posTagged

得到了这个:

 Maybe that is why whenever we go to watch any live sport in India they lock us within cages. Thanks to the Cricket Lovers in the Barabati Stadium in Cuttack, this is probably only going to get worse.
    But right now you, the Cricket Lovers at the Barabati Stadium, have a bigger problem to deal with. I hope you realize what you have done. You didn’t just disrupt a game last evening, you may have just ensured you won’t get international cricket in your city. So much for your love!
    Thanks to a bunch of hooligans, every Indian fan has been blackened. We are all hanging our heads in shame. This feeling is far worse than losing just a cricket match.

    Traceback (most recent call last):
      File "C:\Python27\TestProj1.py", line 12, in <module>
        posTagged=nltk.pos_tag(text)
      File "C:\Python27\lib\site-packages\nltk\tag\__init__.py", line 106, in pos_tag
        return tagger.tag(tokens)
      File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 61, in tag
        tags.append(self.tag_one(tokens, i, tags))
      File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 81, in tag_one
        tag = tagger.choose_tag(tokens, index, history)
      File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 634, in choose_tag
        featureset = self.feature_detector(tokens, index, history)
      File "C:\Python27\lib\site-packages\nltk\tag\sequential.py", line 736, in feature_detector
        'prevtag+word': '%s+%s' % (prevtag, word.lower()),
    UnicodeDecodeError: 'ascii' codec can't decode byte 0x92 in position 4: ordinal not in range(128)

但对于其他一些文本文件,它的工作正常。  怎么解决这个问题?

0 个答案:

没有答案