为什么python尝试使用ASCII编码我的mysqldb的unicode字符串?

时间:2016-10-17 03:19:30

标签: python-2.7 unicode scrapy mysql-python

是的,再一个关于unicode和Python的问题。我想我已经阅读了所有内容并且在你们开始关注Unicode之后采用了良好的编程习惯,但这个错误又回到了我身上:

'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)

我正在使用Python 2.7的scrapy,所以来自“外部世界”的是网页是否正确解码并使用xpath进行处理,这可以通过来自异常之前的那些检查的空错误日志看出:

if not isinstance(key, unicode):
    logging.error(u"Key is not unicode: %s" % key)

if not isinstance(value, unicode):
    logging.error(u"value is not unicode: %s" % value)

if not isinstance(item['listingId'], int):
    logging.error(u"item['listingId'] is not an int:%s" % item['listingId'])

但是,当MySql事务在下一行开始时:

d = txn.execute("INSERT IGNORE INTO `listingsDetails` VALUE (%s, %s, %s);", (item['listingId'], key, value))

我仍然不时得到这个例外。 (占页面的1%) 注意“pipeline.py”行403,它是MySql INSERT。

2016-10-16 22:22:10 [Listings] ERROR: [Failure instance: Traceback:     <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
/usr/lib/python2.7/threading.py:801:__bootstrap_inner
/usr/lib/python2.7/threading.py:754:run
/usr/lib/python2.7/dist-packages/twisted/_threads/_threadworker.py:46:work
/usr/lib/python2.7/dist-packages/twisted/_threads/_team.py:190:doWork
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:246:inContext
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:262:<lambda>
/usr/lib/python2.7/dist-packages/twisted/python/context.py:118:callWithContext
/usr/lib/python2.7/dist-packages/twisted/python/context.py:81:callWithContext
/usr/lib/python2.7/dist-packages/twisted/enterprise/adbapi.py:445:_runInteraction
/home/mrme/git/rep/scrapy_prj/firstproject/firstproject/pipelines.py:403:_do_upsert
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:228:execute
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:127:_warning_check
/usr/lib/python2.7/logging/__init__.py:1724:_showwarning
/usr/lib/python2.7/warnings.py:50:formatwarning

我用:

打开了MySql连接
dbargs = dict(
    host=settings['MYSQL_HOST'],
    db=settings['MYSQL_DBNAME'],
    user=settings['MYSQL_USER'],
    passwd=settings['MYSQL_PASSWD'],
    charset='utf8',
    use_unicode=True
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)

我还尝试根据MySql文档在连接后添加它:

self.dbpool.runOperation("SET NAMES 'utf8'", )
self.dbpool.runOperation("SET CHARSET 'utf8'",)

确认我的数据库设置正确:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name;

character_set_client:     utf8mb4
character_set_connection: utf8mb4
character_set_database:   utf8
character_set_filesystem: binary
character_set_results:    utf8mb4
character_set_server:     latin1
character_set_system:     utf8

到底是谁试图在Ascii编码?

欢迎每2个人; - )

如果有任何帮助,所有其他重音字符在数据库中都可以。只有这个''\ xc0'=À才有问题。

3 个答案:

答案 0 :(得分:0)

如果出现异常,您可以输入以下代码。

varName = ''.join([i if ord(i) < 128 else ' ' for i in strName])

这里,strName是包含Non ascii值的字符串

答案 1 :(得分:0)

假设dbpool是你的连接变量,请尝试以下方法(过度杀伤,但看看它是否有效):

dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
dbpool.set_character_set('utf8mb4')
dbpool.runOperation('SET NAMES utf8mb4;')
dbpool.runOperation('SET CHARACTER SET utf8mb4;')
dbpool.runOperation('SET character_set_connection=utf8mb4;')

如果这没有帮助,请参阅下文。

<强>旁注

如果你跑:

SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';

除OP中的那些行外,您还应获得以下行:

| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |

如果没有,请快速阅读我对问题Manipulating utf8mb4 data from MySQL with PHP的回答,并编辑您的MySQL配置。

答案 2 :(得分:0)

感谢所有人的出色表现,但我终于找到了异常的来源。

Python尝试将DUPLICATE WARNING从Mysql转换回ASCII以进行日志记录,并尝试将此警告(包含unicode)输出回stdout。

INSERT IGNORE替换INSERT... ON DUPLICATE UPDATE,此异常将永远消失。

任何人都可以帮助我找到为什么不能在utf8中将这个警告打印回到我的ubuntu机器上的其他所有内容?