我在Python 2.7中使用以下代码来从utf-8
中编码的csv文件中读取数据。我正在使用codecs.open
和encoding=utf-8
来阅读文本文件,但是,我仍然遇到同样的问题。
import csv
import gensim
import nltk
from nltk.corpus import stopwords
import codecs
...
with codecs.open('data_techsupport.csv', encoding='utf-8', errors='ignore') as csvfile:
spamreader = csv.reader(csvfile, delimiter=',', quotechar='"')
for row in spamreader:
vectors.append([])
x=list(row)
x[0].encode('utf8')
sentence=nltk.word_tokenize(x[0])
...
我得到的错误是:
for row in spamreader:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2013' in position 89: ordinal not in range(128)
有人可以帮忙吗?