import json
import urllib
import re
import binascii
def asciirepl(match):
s = match.group()
return binascii.unhexlify(s[2:])
query = 'google'
p = urllib.urlopen('http://www.google.com/dictionary/json?callback=a&q='+query+'&sl=en&tl=en&restrict=pr,de&client=te')
page = p.read()[2:-10] #As its returned as a function call
#To replace hex characters with ascii characters
p = re.compile(r'\\x(\w{2})')
ascii_string = p.sub(asciirepl, page)
#Now decoding cleaned json response
data = json.loads(ascii_string)
运行它,我收到此错误,
shadyabhi@archlinux /tmp $ python2 define.py
Traceback (most recent call last):
File "define.py", line 19, in <module>
data = json.loads(ascii_string)
File "/usr/lib/python2.7/json/__init__.py", line 326, in loads
return _default_decoder.decode(s)
File "/usr/lib/python2.7/json/decoder.py", line 366, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python2.7/json/decoder.py", line 382, in raw_decode
obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 403 (char 403)
据我所知,json没有任何错误,因为我从谷歌的服务器上收到了它。所有,我做的是删除十六进制字符。任何帮助都将受到高度赞赏。
答案 0 :(得分:3)
解码\ x转义符可能会产生“标记,这些标记需要重新转义,因为它们出现在JSON数据中编码的”字符串“中。
def asciirepl(match):
chr = binascii.unhexlify(match.group()[2:])
return '\\' + chr if chr in ('\\"') else chr
仍然无法处理控制字符;所以你可能想要将\ x转换转换为\ u转义符,这些转义在JSON标准中描述并由json
模块解析。这有一个简单的附带好处:)
def asciirepl(match):
return '\\u00' + match.group()[2:]
答案 1 :(得分:2)
字符403是“text”中的第一个嵌入式引号 - 这是无效的json:
{
"type":"url",
"text":"<a href="http://www.people-communicating.com/jargon-words.html">http://www.people-communicating.com/jargon-words.html</a>",
"language":"en"
}
这是服务器返回的内容 - 注意,没有嵌入式引号:
{
"type":"url",
"text":"\\x3ca href\\x3d\\x22http://www.people-communicating.com/jargon-words.html\\x22\\x3ehttp://www.people-communicating.com/jargon-words.html\\x3c/a\\x3e",
"language":"en"
}
执行此操作的最佳方法是先解码json,然后根据需要对每个字符串进行去除。
编辑:如果这真的是无效的JSON,正如Karl Knechtel在评论中所说的那样,谷歌应该被告知他们的API是不正确的。如果Python的实现正在对有效的JSON进行处理,那么应该告诉他们修复它。无论您采取何种解决方法,如果解决这个问题,都应该很容易删除。