我有一个csv文件,它有4列之一,即tweet_id,label,topic,text。在其中一行中," text" column的值为:
I'm wit chu!! “@ShayDiddy: Officially boycotting @ups!!! Calling @apple to curse them out next for using them wasting my time!â€
我正在使用此代码导入数据:
def createTrainingCorpus(corpusFile):
import csv
corpus=[]
with open(corpusFile,'rb') as csvfile:
lineReader = csv.reader(csvfile,delimiter=',')
r=1
for row in lineReader:
if r<257:
corpus.append({"tweet_id":row[2],"label":row[1],"topic":row[0],"text":row[4]})
r=r+1
return corpus
corpusFile= "/Users/name/Desktop/corpus.csv"
TrainingData= createTrainingCorpus(corpusFile)
此行未添加到TrainingData列表中,但收到错误:
UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 6: ordinal not in range(128)
TrainingData列表具有所有预期的元素,直到循环到达具有&#34; text&#34;的行。正如刚才提到的。我搜索了该错误,但无法找到适合我的解决方案。请帮忙。