如何使用Python从mysql数据库中获取和打印utf-8数据?

时间:2014-12-03 15:45:10

标签: python mysql unicode utf-8

我在使用Python从MySQL数据库读取utf-8数据时遇到问题。我的数据库包含一个名为Videos的表,该表至少包含一行具有Unicode字符的行,即

  

[KR]三星Galaxy Beam 2간단리뷰[4K]

表的排序规则为utf8_general_ci,就像表格中字段的整理一样。

这是我为了从表中获取所有数据而编写的代码:

# Open database connection
db = MySQLdb.connect("localhost","matan","pass","youtube", charset = 'utf8',use_unicode=True)

# prepare a cursor object using cursor() method
cursor = db.cursor()

# Prepare SQL query to INSERT a record into the database.
sql = "SELECT * FROM VIDEOS"
try:
   # Execute the SQL command
   cursor.execute(sql)
   # Fetch all the rows in a list of lists.
   results = cursor.fetchall()
   for row in results:
      title = row[0]
      link = row[1]
      # Now print fetched result
      print ("title=%s\nlink=%s\n\n" % \
            (title, link))
except:
   print "Error: unable to fecth data"

# disconnect from server
db.close()

当我运行上面的代码时,它打印所有只包含“ascii”字符的行,但当它到达包含Unicode字符的行(即上面提到的行)时,它会打印:

File "C:\Users\Matan\Dropbox\Code\Python\youtube.py", line 28, in printall (title, link)) File "C:\Python27\lib\encodings\cp862.py", line 12, in encode return codecs.charmap_encode(input,errors,encoding_map) UnicodeEncodeError: 'charmap' codec can't encode characters in position 33-34: c haracter maps to <undefined>

并且不会继续下一行。

我正在使用PhpMyAdmin版本4.1.14,MySQL版本5.6.17和Python版本2.7.8。

编辑:我删除了except子句,并更新了我得到的错误。

1 个答案:

答案 0 :(得分:3)

您的问题在于您的终端(sys.stdout)编码(cf http://en.wikipedia.org/wiki/Code_page_862),这取决于您的系统设置。最佳解决方案(如此处所述:https://stackoverflow.com/a/15740694/41316)是在将unicode数据打印到sys.stdout之前对其进行明确编码。

如果你不能使用更有用的编码(想到utf-8,因为它被设计为处理所有unicode字符),你至少可以使用替代错误处理,如“替换”(替换非带有'?')或“忽略”的可编码字符(抑制不可编码的字符)。

以下是您的代码的更正版本,您可以使用encodingon_error设置来了解哪种解决方案适合您:

import sys
import MySQLdb

# set desired output encoding here
# it looks like your default encoding is "cp862"
# but you may want to first try 'utf-8' first
# encoding = "cp862"
encoding = "utf-8" 

# what do when we can't encode to the desired output encoding
# options are:
# - 'strict' : raises a UnicodeEncodeError (default)
# - 'replace': replaces missing characters with '?'
# - 'ignore' : suppress missing characters
on_error = "replace" 

db = MySQLdb.connect(
   "localhost","matan","pass","youtube", 
   charset='utf8',
   use_unicode=True
   )
cursor = db.cursor()
sql = "SELECT * FROM VIDEOS"
try:
   cursor.execute(sql)
   for i, row in enumerate(cursor):
      try:
         # encode unicode data to the desired output encoding
         title = row[0].encode(encoding, on_error)
         link = row[1].encode(encoding, on_error)
      except UnicodeEncodeError as e:
         # only if on_error='strict'
         print >> sys.stderr, "failed to encode row #%s - %s" % (i, e)
      else:
         print "title=%s\nlink=%s\n\n" % (title, link))
finally:
   cursor.close()
   db.close()

注意:您可能还想阅读本文(特别是评论)http://drj11.wordpress.com/2007/05/14/python-how-is-sysstdoutencoding-chosen/,了解有关Python,字符串,unicode,编码,sys.stdout和终端问题的更多信息。