任何Python开发人员多年来一直在使用Python的Unicode问题。 但现在我遇到的情况让我疯了,我自己无法解决这个问题。 现在已经花了1天,包括recherches ..
我的设置是一个小的Django应用程序,它通过SOAP(使用Suds)连接到远程系统,提取一些数据并在Django的数据库中查找它:
from myapp.models import Customer
client = suds.client.Client(...)
customer = client.service.getCustomerByEmail('foo@bar.com')
type(customer.email): <class 'suds.sax.text.Text'>
customer_exists = Customer.objects.filter(email=customer.email)
现在客户的电子邮件地址有一个德语Umlautü,它让Django提出如下例外:
Traceback (most recent call last):
File "run_anatomy_client.py", line 19, in <module>
print client.main()
File "/Users/user/Documents/workspace/Wawi/application/myapp/client.py", line 282, in main
if not Customer.objects.filter(email=customer.email.encode('latin1')):
File "/Users/user/Documents/workspace/Wawi/application/myapp/client.py", line 76, in sync_customer
if not customer_exists:
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 113, in __nonzero__
iter(self).next()
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 107, in _result_iter
self._fill_cache()
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 772, in _fill_cache
self._result_cache.append(self._iter.next())
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/query.py", line 273, in iterator
for row in compiler.results_iter():
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 680, in results_iter
for rows in self.execute_sql(MULTI):
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/models/sql/compiler.py", line 735, in execute_sql
cursor.execute(sql, params)
File "/Users/user/Documents/workspace/Wawi/pyenv/lib/python2.7/site-packages/django/db/backends/util.py", line 43, in execute
logger.debug('(%.3f) %s; args=%s' % (duration, sql, params),
UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 28: ordinal not in range(128)
我已经玩过encode(),decode(),改变了源文件的编码以及数据库布局,目前看起来如下:
mysql> show variables like '%character%';
+--------------------------+-----------------------------------------+
| Variable_name | Value |
+--------------------------+-----------------------------------------+
| character_set_client | latin1 |
| character_set_connection | latin1 |
| character_set_database | utf8 |
| character_set_filesystem | binary |
| character_set_results | latin1 |
| character_set_server | latin1 |
| character_set_system | utf8 |
| character_sets_dir | /opt/local/share/mysql5/mysql/charsets/ |
+--------------------------+-----------------------------------------+
8 rows in set (0.00 sec)
奇怪的是 - 如果我设置一个跟踪点并在Django shell中执行完全相同的行,那么在使用encode()时它可以正常工作:
(Pdb) Customer.objects.filter(email=customer.email)
*** UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 28: ordinal not in range(128)
(Pdb) Customer.objects.filter(email=customer.email.encode('utf-8'))
[]
我会感激任何提示......
答案 0 :(得分:2)
suds.sax.text.Text继承自unicode
class Text(unicode):
"""
An XML text object used to represent text content.
@ivar lang: The (optional) language flag.
@type lang: bool
@ivar escaped: The (optional) XML special character escaped flag.
@type escaped: bool
"""
如果您想使用,可以编码为UTF-8。
email = customer.email.encode("utf-8")
customer_exists = Customer.objects.filter(email=email)
答案 1 :(得分:0)
我花了2个多小时试图弄清楚发生了什么,以及为什么在将Suds数据结构中的值分配给对象的字段后,我无法保存Django对象。
正如@guillaumevincent所提到的,Suds Text类继承自unicode,并且实现不是100%正确,因此Django在尝试执行某些操作后失败,这将与基本unicode类型一起使用。
所以对于我会做的问题的例子
customer_exists = Customer.objects.filter(email=unicode(customer.email))
就我而言,我也是这样做的
django_obj.field_name = suds_obj.field_name
希望这可以节省一些时间:)