我在python 2.7上运行以下代码nltk 3.2.5
我的代码:
import codecs
testing = codecs.open("Desktop/sample.txt",'r', 'utf-8').read()
sentences = sent_tokenize(testing)
labeled_sentence = ([(name, "question") for name in sent_tokenize(q)] +[(name, "notquestion") for name in sent_tokenize(i)])
random.shuffle(labeled_sentence)
feature_sets = [(get_features(n), label) for (n, label) in labeled_sentence]
train_set, test_set = feature_sets[500:], feature_sets[:500]
#print(test_set)
classifier = nltk.NaiveBayesClassifier.train(train_set)
for sent in sentences:
print(str(sent) + "" + str(classifier.classify(get_features(sent))))
我收到了以下错误:
UnicodeEncodeError Traceback (most recent call last)
<ipython-input-10-c12d2b078e87> in <module>()
13 #nltk.classify.accuracy(classifier, test_set)
14 for sent in sentences:
---> 15 print(str(sent) + "" + str(classifier.classify(get_features(sent))))
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 69: ordinal not in range(128)
请帮忙