我正在调用网址:
http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json
使用urllib2并使用json模块进行解码
url = "http://code.google.com/feeds/issues/p/chromium/issues/full/291?alt=json"
request = urllib2.Request(query)
response = urllib2.urlopen(request)
issue_report = json.loads(response.read())
我遇到以下错误:
ValueError: Invalid control character at: line 1 column 1120 (char 1120)
我尝试检查标题,但我得到了以下内容:
Content-Type: application/json; charset=UTF-8
Access-Control-Allow-Origin: *
Expires: Sun, 03 Jul 2011 17:38:38 GMT
Date: Sun, 03 Jul 2011 17:38:38 GMT
Cache-Control: private, max-age=0, must-revalidate, no-transform
Vary: Accept, X-GData-Authorization, GData-Version
GData-Version: 1.0
ETag: W/"CUEGQX47eCl7ImA9WxJaFEw."
Last-Modified: Tue, 04 Aug 2009 19:20:20 GMT
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Connection: close
我还尝试添加编码参数,如下所示:
issue_report = json.loads(response.read() , encoding = 'UTF-8')
我仍然遇到同样的错误。
答案 0 :(得分:4)
此时,Feed中包含JPEG中的原始数据; JSON格式不正确,所以这不是你的错。向Google报告错误。
答案 1 :(得分:2)
您可以考虑使用lxml
,因为JSON格式不正确。它的XPath支持使得使用XML变得非常简单:
import lxml.etree
url = 'http://code.google.com/feeds/issues/p/chromium/issues/full/291'
doc = lxml.etree.parse(url)
ns = {'issues': 'http://schemas.google.com/projecthosting/issues/2009'}
issues = doc.xpath('//issues:*', namespaces=ns)
操作元素相当容易,例如从标签中剥离命名空间,转换为dict:
>>> dict((x.tag[len(ns['issues'])+2:], x.text) for x in issues)
<<<
{'closedDate': '2009-08-04T19:20:20.000Z',
'id': '291',
'label': 'Area-BrowserUI',
'stars': '13',
'state': 'closed',
'status': 'Verified'}