是的,再一个关于unicode和Python的问题。我想我已经阅读了所有内容并且在你们开始关注Unicode之后采用了良好的编程习惯,但这个错误又回到了我身上:
'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
我正在使用Python 2.7的scrapy,所以来自“外部世界”的是网页是否正确解码并使用xpath进行处理,这可以通过来自异常之前的那些检查的空错误日志看出:
if not isinstance(key, unicode):
logging.error(u"Key is not unicode: %s" % key)
if not isinstance(value, unicode):
logging.error(u"value is not unicode: %s" % value)
if not isinstance(item['listingId'], int):
logging.error(u"item['listingId'] is not an int:%s" % item['listingId'])
但是,当MySql事务在下一行开始时:
d = txn.execute("INSERT IGNORE INTO `listingsDetails` VALUE (%s, %s, %s);", (item['listingId'], key, value))
我仍然不时得到这个例外。 (占页面的1%) 注意“pipeline.py”行403,它是MySql INSERT。
2016-10-16 22:22:10 [Listings] ERROR: [Failure instance: Traceback: <type 'exceptions.UnicodeEncodeError'>: 'ascii' codec can't encode character u'\xc0' in position 24: ordinal not in range(128)
/usr/lib/python2.7/threading.py:801:__bootstrap_inner
/usr/lib/python2.7/threading.py:754:run
/usr/lib/python2.7/dist-packages/twisted/_threads/_threadworker.py:46:work
/usr/lib/python2.7/dist-packages/twisted/_threads/_team.py:190:doWork
--- <exception caught here> ---
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:246:inContext
/usr/lib/python2.7/dist-packages/twisted/python/threadpool.py:262:<lambda>
/usr/lib/python2.7/dist-packages/twisted/python/context.py:118:callWithContext
/usr/lib/python2.7/dist-packages/twisted/python/context.py:81:callWithContext
/usr/lib/python2.7/dist-packages/twisted/enterprise/adbapi.py:445:_runInteraction
/home/mrme/git/rep/scrapy_prj/firstproject/firstproject/pipelines.py:403:_do_upsert
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:228:execute
/usr/lib/python2.7/dist-packages/MySQLdb/cursors.py:127:_warning_check
/usr/lib/python2.7/logging/__init__.py:1724:_showwarning
/usr/lib/python2.7/warnings.py:50:formatwarning
我用:
打开了MySql连接dbargs = dict(
host=settings['MYSQL_HOST'],
db=settings['MYSQL_DBNAME'],
user=settings['MYSQL_USER'],
passwd=settings['MYSQL_PASSWD'],
charset='utf8',
use_unicode=True
)
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
我还尝试根据MySql文档在连接后添加它:
self.dbpool.runOperation("SET NAMES 'utf8'", )
self.dbpool.runOperation("SET CHARSET 'utf8'",)
确认我的数据库设置正确:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name;
character_set_client: utf8mb4
character_set_connection: utf8mb4
character_set_database: utf8
character_set_filesystem: binary
character_set_results: utf8mb4
character_set_server: latin1
character_set_system: utf8
到底是谁试图在Ascii编码?
欢迎每2个人; - )
如果有任何帮助,所有其他重音字符在数据库中都可以。只有这个''\ xc0'=À才有问题。
答案 0 :(得分:0)
如果出现异常,您可以输入以下代码。
varName = ''.join([i if ord(i) < 128 else ' ' for i in strName])
这里,strName是包含Non ascii值的字符串
答案 1 :(得分:0)
假设dbpool
是你的连接变量,请尝试以下方法(过度杀伤,但看看它是否有效):
dbpool = adbapi.ConnectionPool('MySQLdb', **dbargs)
dbpool.set_character_set('utf8mb4')
dbpool.runOperation('SET NAMES utf8mb4;')
dbpool.runOperation('SET CHARACTER SET utf8mb4;')
dbpool.runOperation('SET character_set_connection=utf8mb4;')
如果这没有帮助,请参阅下文。
<强>旁注强>
如果你跑:
SHOW VARIABLES WHERE Variable_name LIKE 'character\_set\_%' OR Variable_name LIKE 'collation%';
除OP中的那些行外,您还应获得以下行:
| collation_connection | utf8mb4_unicode_ci |
| collation_database | utf8mb4_unicode_ci |
| collation_server | utf8mb4_unicode_ci |
如果没有,请快速阅读我对问题Manipulating utf8mb4 data from MySQL with PHP的回答,并编辑您的MySQL配置。
答案 2 :(得分:0)
感谢所有人的出色表现,但我终于找到了异常的来源。
Python尝试将DUPLICATE WARNING
从Mysql转换回ASCII以进行日志记录,并尝试将此警告(包含unicode)输出回stdout。
用INSERT IGNORE
替换INSERT... ON DUPLICATE UPDATE
,此异常将永远消失。
任何人都可以帮助我找到为什么不能在utf8中将这个警告打印回到我的ubuntu机器上的其他所有内容?