我有一个mysql数据库。我将字符集设置为utf8;
...
PRIMARY KEY (`username`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 |
...
我使用MySQLdb连接到python中的db;
conn = MySQLdb.connect(host = "localhost",
passwd = "12345",
db = "db",
charset = 'utf8',
use_unicode=True)
当我执行查询时,响应将使用" windows-1254"进行解码。示例响应;
curr = conn.cursor(MySQLdb.cursors.DictCursor)
select_query = 'SELECT * FROM users'
curr.execute(select_query)
for ret in curr.fetchall():
username = ret["username"]
print "repr-username; ", repr(username)
print "username; "username.encode("utf-8")
...
输出是;
repr-username; u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username; şükrüçağlüli
当我使用" windows-1254"打印用户名时它工作正常;
...
print "repr-username; ", repr(username)
print "username; ", username.encode("windows-1254")
...
输出是;
repl-username; u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
username; şükrüçağlüli
当我尝试使用其他字符(例如西里尔字母)时,解码会被动态地改变。我该如何预防?
答案 0 :(得分:3)
我认为INSERT到数据库时编码错误的项目。
我推荐python-ftfy(来自https://github.com/LuminosoInsight/python-ftfy)(帮我解决了类似问题):
import ftfy
username = u'\xc5\u0178\xc3\xbckr\xc3\xbc\xc3\xa7a\xc4\u0178l\xc3\xbcli'
print ftfy.fix_text(username) # outputs şükrüçağlüli