tl; dr我有一个复杂的SQL查询,它以.csv格式返回结果。不幸的是,作为加拿大人的缺点是,很多人都喜欢在他们的名字上添加口音。
目前,这就是我所拥有的。我最初没有编写脚本,只是试图让它抓取数据。 SQL是正确的,只是被nano截断。
def getCommentsByGUID(reportDay):
conn = mysql.connector.connect(host = dbServer, user = dbUser, passwd = dbPass, db = tscDBName)
sqlresult = ''
sql = 'select d.documentId, dn.created, dn.notes as Comment, u.login, d.fileName from DocumentInfo d \
inner join documentnotes dn on d.documentId=dn.documentId \
inner join User u on dn.userId=u.userID \
where d.companyId=%d and dn.created>\'%s 00:00:00\' and dn.created <\'%s 23:59:59\';' % (companyID, reportDay, reportDa$
cursor = conn.cursor()
cursor.execute (sql)
sqlresult = cursor.fetchall()
cursor.close ()
reportWriter = csv.writer(open('%sFilename_comments_%s.csv' % (outputDir, reportDay), 'w'), delimiter=',', quotec$
for results in sqlresult:
reportWriter.writerow(results)
按原样运行会产生unicode错误,例如:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 366: ordinal not in range(128)
现在,经过大量研究,我发现了这样的事情:
for results in sqlresult:
try:
reportWriter.writerow(results)
except UnicodeEncodeError:
s = list(results)
for item in s:
if isinstance(item, basestring) == True:
a = s.index(item)
unicodedata.normalize('NFKD', item).encode('ascii', 'ignore')
s[a] = item
print item
s = tuple(s)
print s
reportWriter.writerow(s)
然而仍然得到同样的错误:
UnicodeEncodeError: 'ascii' codec can't encode character u'\u2019' in position 366: ordinal not in range(128)
关于我做错了什么,或者我还能尝试什么的任何想法?谢谢!
答案 0 :(得分:1)
MySQL连接显然将字符串作为unicode
个对象返回,而传递给csv.writer
的文件句柄未使用特定编码打开,因此期望获得表示原始字节的str
个对象
解决方案1:在将字符串传递给writerow
之前,用您首选的编码(可能是UTF-8)对字符串进行编码:
for results in sqlresult:
results = [x.encode('utf-8') if isinstance(x, unicode) else x for x in results]
reportWriter.writerow(results)
解决方案2:以支持Unicode的模式打开输出文件句柄:
import io
reportWriter = csv.writer(io.open('%sFilename_comments_%s.csv' % (outputDir, reportDay), 'w', encoding='utf-8'), delimiter=',', quotec$