如何在Python中从SQL数据库中打印UTF-8字符?

时间:2015-11-24 19:58:56

标签: python mysql encoding utf-8 mariadb

我有一个SQL数据库(MariaDB)的数据,其中一些包含UTF-8字符(主要是ÄÅ)。在Python中打印此数据时,我没有得到正确的字符。但是,如果我直接打印UTF-8字符(例如print("ÖÖ ää öö")),它就可以工作。

在我的.py中我有# -*- coding: utf-8 -*-而在我的.sql中有SET character_set_server = "utf8";

1 个答案:

答案 0 :(得分:0)

http://mysql.rjweb.org/doc.php/charcoll#python

源代码中的第1行或第2行:# - - 编码:utf-8 - -

用于转换字符串'u'的十六进制(等)的Python代码:

表示i,c为枚举(u):     print i,'%04x'%ord(c),unicodedata.category(c),     print unicodedata.name(c)

Miscellany关于编码utf8的注意事项:

⚈  db = MySQLdb.connect(host=DB_HOST, user=DB_USER, passwd=DB_PASS, db=DB_NAME, charset="utf8", use_unicode=True)
⚈  conn = MySQLdb.connect(host="localhost", user='root', password='', db='', charset='utf8')
⚈  cursor.execute("SET NAMES utf8mb4;") -- not as good as using `charset'
⚈  db.set_character_set('utf8'), implies use_unicode=True
⚈  Literals should be u'...'
⚈  MySQL-python 1.2.4 fixes a bug wherein varchar(255) CHARACTER SET utf8 COLLATE utf8_bin is treated like a BLOB.

清单:

⚈  `# -*- coding: utf-8 -*-` -- (you have that)
⚈  `charset='utf8'` in `connect()` call -- Is that buried in `bottle_mysql.Plugin`? (Note: Try 'utf-8' and 'utf8')
⚈  Text encoded in utf8.
⚈  No need for encode() or decode() if you are willing to accept utf8 everywhere.
⚈  `u'...'` for literals
⚈  `` near start of html page
⚈  Content-Type: text/html; charset=UTF-8 (in HTTP response header)
⚈  header('Content-Type: text/html; charset=UTF-8'); (in PHP to get that response header)
⚈  `CHARACTER SET utf8 COLLATE utf8_general_ci` on column (or table) definition in MySQL.
⚈  utf8 all the way through

参考文献:

⚈  https://docs.python.org/2/howto/unicode.html#the-unicode-type
⚈  http://stackoverflow.com/questions/9154998/python-encoding-mysql
⚈  http://dev.mysql.com/doc/connector-python/en/connector-python-connectargs.html

Python语言环境正式仅在2.0版本内部使用UCS-2,但UTF-8解码器为“Unicode”产生正确的UTF-16。从Python 2.2开始,支持使用UTF-32的“宽”版本的Unicode; [16]这些主要用于Linux。 Python 3.3不再使用UTF-16,而是字符串存储在ASCII / Latin-1,UCS-2或UTF-32之一,具体取决于字符串中的代码点,还包括UTF-8版本所以重复转换为UTF-8很快。