解码json编码为GB2312

时间:2012-12-19 17:51:46

标签: python unicode encoding gb2312

通过GET请求,我从Google地理编码API中提取json:

import urllib, urllib2

url = "http://maps.googleapis.com/maps/api/geocode/json"
params = {'address': 'ivory coast', 'sensor': 'false'}
request = urllib2.Request(url + "?" + urllib.urlencode(params))
response = urllib2.urlopen(request)
st = response.read()

出现的内容如下:

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "Côte d'Ivoire",
               "short_name" : "CI",
               "types" : [ "country", "political" ]
            }
         ],
         "formatted_address" : "Côte d'Ivoire",
         "geometry" : { ... # rest snipped

如您所见,国家/地区名称存在一些编码问题。 我试着像这样猜测编码:

import chardet
encoding = chardet.detect(st)
print "String is encoded in {0} (with {1}% confidence).".format(encoding['encoding'], encoding['confidence']*100)

返回:

String is encoded in GB2312 (with 99.0% confidence).

我想知道的是如何将其转换为带有编码的字典,其中ô(o with circumflex)被正确显示。

我试过了:

st = st.decode(encoding['encoding']).encode('utf-8')

但后来我得到了:

{
   "results" : [
      {
         "address_components" : [
            {
               "long_name" : "C么te d'Ivoire",
               "short_name" : "CI",
               "types" : [ "country", "political" ]
            }
         ],
         "formatted_address" : "C么te d'Ivoire",
         "geometry" : { ... # rest snipped

2 个答案:

答案 0 :(得分:3)

google api结果始终以UTF-8编码,您甚至可以从其HTTP Content-Type标题中手动阅读:

enter image description here

答案 1 :(得分:2)

一旦你(正确)解码它,不要重新编码它; json可以很好地与unicode合作。

>>> json.loads(u"[\"C\xf4te d'Ivoire\"]")
[u"C\xf4te d'Ivoire"]