我在使用Python从MySQL数据库读取utf-8数据时遇到问题。我的数据库包含一个名为Videos
的表,该表至少包含一行具有Unicode字符的行,即
[KR]三星Galaxy Beam 2간단리뷰[4K]
表的排序规则为utf8_general_ci
,就像表格中字段的整理一样。
这是我为了从表中获取所有数据而编写的代码:
# Open database connection
db = MySQLdb.connect("localhost","matan","pass","youtube", charset = 'utf8',use_unicode=True)
# prepare a cursor object using cursor() method
cursor = db.cursor()
# Prepare SQL query to INSERT a record into the database.
sql = "SELECT * FROM VIDEOS"
try:
# Execute the SQL command
cursor.execute(sql)
# Fetch all the rows in a list of lists.
results = cursor.fetchall()
for row in results:
title = row[0]
link = row[1]
# Now print fetched result
print ("title=%s\nlink=%s\n\n" % \
(title, link))
except:
print "Error: unable to fecth data"
# disconnect from server
db.close()
当我运行上面的代码时,它打印所有只包含“ascii”字符的行,但当它到达包含Unicode字符的行(即上面提到的行)时,它会打印:
File "C:\Users\Matan\Dropbox\Code\Python\youtube.py", line 28, in printall
(title, link))
File "C:\Python27\lib\encodings\cp862.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode characters in position 33-34: c
haracter maps to <undefined>
并且不会继续下一行。
我正在使用PhpMyAdmin版本4.1.14,MySQL版本5.6.17和Python版本2.7.8。
编辑:我删除了except子句,并更新了我得到的错误。
答案 0 :(得分:3)
您的问题在于您的终端(sys.stdout
)编码(cf http://en.wikipedia.org/wiki/Code_page_862),这取决于您的系统设置。最佳解决方案(如此处所述:https://stackoverflow.com/a/15740694/41316)是在将unicode数据打印到sys.stdout
之前对其进行明确编码。
如果你不能使用更有用的编码(想到utf-8,因为它被设计为处理所有unicode字符),你至少可以使用替代错误处理,如“替换”(替换非带有'?')或“忽略”的可编码字符(抑制不可编码的字符)。
以下是您的代码的更正版本,您可以使用encoding
和on_error
设置来了解哪种解决方案适合您:
import sys
import MySQLdb
# set desired output encoding here
# it looks like your default encoding is "cp862"
# but you may want to first try 'utf-8' first
# encoding = "cp862"
encoding = "utf-8"
# what do when we can't encode to the desired output encoding
# options are:
# - 'strict' : raises a UnicodeEncodeError (default)
# - 'replace': replaces missing characters with '?'
# - 'ignore' : suppress missing characters
on_error = "replace"
db = MySQLdb.connect(
"localhost","matan","pass","youtube",
charset='utf8',
use_unicode=True
)
cursor = db.cursor()
sql = "SELECT * FROM VIDEOS"
try:
cursor.execute(sql)
for i, row in enumerate(cursor):
try:
# encode unicode data to the desired output encoding
title = row[0].encode(encoding, on_error)
link = row[1].encode(encoding, on_error)
except UnicodeEncodeError as e:
# only if on_error='strict'
print >> sys.stderr, "failed to encode row #%s - %s" % (i, e)
else:
print "title=%s\nlink=%s\n\n" % (title, link))
finally:
cursor.close()
db.close()
注意:您可能还想阅读本文(特别是评论)http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/,了解有关Python,字符串,unicode,编码,sys.stdout
和终端问题的更多信息。