这是我试图阅读的内容:
with open('u.item', 'w') as demofile:
demofile.write(
"543|Mis\xe9rables, Les (1995)|01-Jan-1995||"
"http://us.imdb.com/M/title-exact?Mis%E9rables%2C%20Les%20%281995%29|
"0|0|0|0|0|0|0|0|1|0|0|0|1|0|0|0|0|0|0\n"
)
这就是我读它的方式
import unicodecsv as csv
def moviesToRDF(csvFilePath):
with open(csvFilePath, 'rU') as csvFile:
reader = csv.reader(csvFile, encoding='utf-8', delimiter= '|')
for row in reader:
print row
moviesToRDF("u.item")
这是我得到的错误:
UnicodeDecodeError: 'utf8' codec can't decode byte 0xe9 in position 3: invalid continuation byte
抛出错误的值是:
Misérables, Les
我做错了什么?
(我正在使用2.7 python)
答案 0 :(得分:1)
我发现了问题
文件编码为latin-1而不是utf 8
这解决了问题
reader = csv.reader(csvFile, encoding='latin-1', delimiter= '|')