请原谅我丑陋的新代码,我正在学习。我从OMDB API中提取电影数据,但当我将其移动到CSV时,我会为许多电影获取UnicodeEncodeError。可能是因为演员姓名有重音符号。我想1.)确定哪些电影有问题,2。跳过它们,和/或3.)最好纠正错误。当错误发生时,我现在只传递整个事物。寻找一个简单的解决方案,因为我是新手。
import csv
import os
import json
import omdb
movie_list = ['A Good Year', 'A Room with a View', 'Anchorman', 'Amélie', 'Annie Hall', 'Before Sunrise']
data_list = []
textdoc = open('textdoc.txt','w')
for w in movie_list:
x = omdb.request(t=w, fullplot=True, tomatoes=True, r='json')
y = x.content
z = json.loads(y)
data_list.append([z["Title"], z["Year"], z["Actors"], z["Awards"], z["Director"], z["Genre"], z["Metascore"], z["Plot"], z["Rated"], z["Runtime"], z["Writer"], z["imdbID"], z["imdbRating"], z["imdbVotes"], z["tomatoRating"], z["tomatoReviews"], z["tomatoFresh"], z["tomatoRotten"], z["tomatoConsensus"], z["tomatoUserMeter"], z["tomatoUserRating"], z["tomatoUserReviews"]])
try:
with open('Films.csv', 'w') as g:
a = csv.writer(g, delimiter=',')
a.writerow(["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"])
a.writerows(data_list)
except UnicodeEncodeError:
print("fail")
答案 0 :(得分:1)
Python 2.x:您可以尝试使用codecs代替with open("Films.csv", 'w') as g:
,以便以UTF-8
编码打开csv输出。
import codecs
with codecs.open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code
Python 3.x:尝试使用g
编码打开UTF-8
:
with open('Films.csv', 'w', encoding='UTF-8') as g:
# rest of code.
答案 1 :(得分:0)
试用smart_str
from django.utils.encoding import smart_str
data_list.append(map(smart_str, [z['element1'], z['element2']]))
a.write_row(map(smart_str, ["Title", "Year", "Actors", "Awards", "Director", "Genre", "Metascore", "Plot", "Rated", "Runtime", "Writer", "imdbID", "imdbRating", "imdbVotes", "tomatoRating", "tomatoReviews", "tomatoFresh", "tomatoRotten", "tomatoConsensus", "tomatoUserMeter", "tomatoUserRating", "tomatoUserReviews"]))
a.write_rows(data_list)
答案 2 :(得分:0)
如果使用Python 2,csvwriter
并不真正支持Unicode,但csv
文档中有一个示例可以解决它。一个例子是this answer。
如果使用Python 3,请进行以下更改:
y = x.content.decode('utf8')
和
with open('Films.csv', 'w', encoding='utf8',newline='') as g:
通过这些更改,文本将被解码为Unicode以便在Python脚本中进行处理,并在写入文件时编码回UTF-8。这是处理Unicode的推荐方法。
newline=''
是打开csv
使用文件的正确方法。请参阅this answer和csv
文档。
您也可以删除try
/ except
。它只是抑制了有用的追溯。
答案 3 :(得分:-1)
对我有用的解决方案是在导出过程的开头添加:
import sys
reload(sys)
sys.setdefaultencoding('utf8')