UnicodeEncodeError:' ascii'编解码器不能对字符u' \ u201c'进行编码。位置69:序数不在范围内(128)

时间:2018-03-28 06:38:10

标签: python-2.7 unicode nltk

我在python 2.7上运行以下代码nltk 3.2.5
我的代码:

  import codecs
testing = codecs.open("Desktop/sample.txt",'r', 'utf-8').read()
sentences = sent_tokenize(testing)
labeled_sentence = ([(name, "question") for name in sent_tokenize(q)] +[(name, "notquestion") for name in sent_tokenize(i)])
random.shuffle(labeled_sentence)
feature_sets = [(get_features(n), label) for (n, label) in labeled_sentence]
train_set, test_set = feature_sets[500:], feature_sets[:500]
#print(test_set)
classifier = nltk.NaiveBayesClassifier.train(train_set)

for sent in sentences:
    print(str(sent) + "" + str(classifier.classify(get_features(sent))))

我收到了以下错误:

UnicodeEncodeError                        Traceback (most recent call last)
<ipython-input-10-c12d2b078e87> in <module>()
     13 #nltk.classify.accuracy(classifier, test_set)
     14 for sent in sentences:
---> 15     print(str(sent) + "" + str(classifier.classify(get_features(sent))))

UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 69: ordinal not in range(128)

请帮忙

0 个答案:

没有答案