当我想将utf-8转换为unicode时,我收到此错误,错误是:
Traceback (most recent call last):
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 1596, in <module>
globals = debugger.run(setup['file'], None, None, is_module)
File "C:\Program Files (x86)\JetBrains\PyCharm Community Edition 2016.3.2\helpers\pydev\pydevd.py", line 974, in run
pydev_imports.execfile(file, globals, locals) # execute the script
File "C:/Users/Administrator/Desktop/new 1.py", line 64, in <module>
view = changeListCode(train[1])
File "C:/Users/Administrator/Desktop/new 1.py", line 33, in changeListCode
a.append(i.decode('utf8'))
File "C:\Anaconda2\lib\encodings\utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xcb in position 7: invalid continuation byte
我的代码是:
def readtrain():
with open('Train.csv','rb')as csvfile:
reader = csv.reader(csvfile)
column1 = [row for row in reader]
content_train = [i[1] for i in column1[1:]]
view_train = [i[2] for i in column1[1:]]
opinion_train = [i[3] for i in column1[1:]]
print '训练集有 %s 条句子' % len(content_train)
train = [content_train, view_train, opinion_train]
return train
def changeListCode(b):
a = []
for i in b:
a.append(i.decode('utf8'))
return a
def segmentWord2(cont):
c = []
for i in cont:
a = list(jieba.cut(i))
c.append(a)
return c
def transLabel(labels):
for i in range(len(labels)):
if labels[i] == 'pos':
labels[i] = 2
elif labels[i] == 'neu':
labels[i] = 1
elif labels[i] == 'neg':
labels[i] = 0
else: print "label无效:",labels[i]
return labels
train = readtrain()
content = segmentWord2(train[0])
view = changeListCode(train[1])
opinion = transLabel(train[2])
我将代码和文件复制到记事本++,它显示utf-8.i看到了其他问题,但我没有得到解决方案。我现在不能使用python 3.x。
我的文件是这样的:
我不知道如何上传我的文件。这是关于我的文件的截图,我的文件是csv。