Question

我正在将包含单词cafe（但带有重音e）的字符串从javascript源文件复制到python脚本中，我需要对数据进行一些处理，然后输出一些JSON 。我在解决编码/解码细节方面遇到了一些麻烦。这可能最好用一个例子来说明：

$ python
>>> import urllib2, json
>>> the_name = "Tasty Caf%C3%E9"
>>> the_name
'Tasty Caf%C3%E9'
>>> the_name_unquoted = urllib2.unquote(the_name)
>>> the_name_unquoted
'Tasty Caf\xc3\xe9'
>>> json.dumps({'bla': the_name_unquoted})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python2.7/json/__init__.py", line 231, in dumps
return _default_encoder.encode(obj)
File "/usr/lib/python2.7/json/encoder.py", line 201, in encode
chunks = self.iterencode(o, _one_shot=True)
File "/usr/lib/python2.7/json/encoder.py", line 264, in iterencode
return _iterencode(o, 0)
UnicodeDecodeError: 'utf8' codec can't decode byte 0xc3 in position 9: invalid continuation byte

我花了一些时间试图了解编码是如何工作的，但显然我没有得到它。究竟什么编码/格式（这里的任何其他适当的术语？）在上面是the_name_unquoted，它是什么，utf8无法正确解码？

Answer 1

因为unicode编码支持该字符。您可以通过将字符串转换为unicode来解决此问题。

the_name = u'Tasty Caf%C3%E9'

或者，如果已存在字符串，则可以将其转换。

the_name = 'Tasty Caf%C3%E9'
the_name = unicode(the_name) 
# or..
the_name = the_name.decode('utf8', the_name)

使用Python从URL编码的重音e到.json文本文件中的重音e

1 个答案: