nltk.pos_tag()程序堆栈,没有错误显示

时间:2016-03-12 01:35:17

标签: python nltk pos-tagger

(Python 3.5)我有一个奇怪的错误,因为 当我运行代码时,“pos = nltk.pos_tag(words)”。几天后,我已经尝试重新启动程序,在修复旧的降级到nltk 3.1之后我遇到了这个新问题但没有发生任何事情,程序正在运行而没有结果他堆积在nltk.pos_tag()但它不是告诉我任何错误,直到我决定关闭程序我以前没有遇到过这个问题,我不知道它是什么,我试图改变标记语音循环中的几乎所有内容但它总是一样的错误

import nltk
import random
from nltk.classify.scikitlearn import SklearnClassifier
import pickle
from nltk.classify import ClassifierI
from statistics import mode
from nltk.tokenize import word_tokenize

class VoteClassifier(ClassifierI):
    def __init__(self, *classifiers):
        self._classifiers = classifiers

    def classify(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)
        return mode(votes)

    def confidence(self, features):
        votes = []
        for c in self._classifiers:
            v = c.classify(features)
            votes.append(v)

        choice_votes = votes.count(mode(votes))
        conf = choice_votes / len(votes)
        return conf

short_pos = open("short_reviews/positive.txt","r").read()
short_neg = open("short_reviews/negative.txt","r").read()

all_words = []
documents = []

allowed_word_types = ["J"]

for p in short_pos.split('\n'):
    documents.append( (p, "pos") )
    words = word_tokenize(p)
    pos = nltk.pos_tag(words)
    for w in pos:
        if w[1][0] in allowed_word_types:
            all_words.append(w[0].lower())

for p in short_neg.split('\n'):
    documents.append( (p, "neg") )
    words = word_tokenize(p)
    pos = nltk.pos_tag(words)
    for w in pos:
        if w[1][0] in allowed_word_types:
            all_words.append(w[0].lower())

all_words = nltk.FreqDist(all_words)

word_features = list(all_words.keys())[:5000]

如果有人有任何线索,那是什么导致了这个问题;我将非常感激。 我提前感谢你为这个问题奋斗了一个多星期。

1 个答案:

答案 0 :(得分:1)

请参阅NLTK v3.2: Unable to nltk.pos_tag()

如果没有降级到NLTK v3.1,使用NLTK 3.2,你可以使用这个“hack”:

>>> from nltk.tag import PerceptronTagger
>>> from nltk.data import find
>>> PICKLE = "averaged_perceptron_tagger.pickle"
>>> AP_MODEL_LOC = 'file:'+str(find('taggers/averaged_perceptron_tagger/'+PICKLE))
>>> tagger = PerceptronTagger(load=False)
>>> tagger.load(AP_MODEL_LOC)
>>> pos_tag = tagger.tag
>>> pos_tag('The quick brown fox jumps over the lazy dog'.split())
[('The', 'DT'), ('quick', 'JJ'), ('brown', 'NN'), ('fox', 'NN'), ('jumps', 'VBZ'), ('over', 'IN'), ('the', 'DT'), ('lazy', 'JJ'), ('dog', 'NN')]