代码:
loaded_json = json.loads(json_set)
json_set
是从网页收集的字符串,它是JSON格式的数据。完整字符串(警告:LONG)位于:http://pastebin.com/wykwNEeg
它给我的错误(如果我将字符串保存到自己的文件中,并且readlines
+ json.loads
IDLE中的那行):
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 62233 (char 62232)
","distance":\u002d1,"lo
^(gedit tells me column 62233 lies between the colon and the \
我猜它与unicode有关,这个特殊的是-
的unicode,所以这个值应该是"distance":-1
奇怪的是,如果我在遇到异常时(或者我猜哪里)打印出来的话,就会出现如上所述。但是,如果我打开python3 IDLE会话并执行此操作,我会得到不同的结果:
>>> mystr = '"distance":\u002d1'
>>> mystr
'"distance":-1'
>>> print(mystr)
"distance":-1
>>>
如何正确加载此JSON?
===============
此数据来自早期的代码(基本上显示该字符串是response.decode('utf8')
的结果):
'''This bit gets the page from the website, it's called from the below code block'''
def load_arbitrary_page(self, url):
response = self.opener.open(url)
response_list = response.readlines()
decode_list = []
for line in response_list:
decode = line.decode('utf8')
decode_list.append(decode)
print(BeautifulSoup(''.join(decode_list)).find("title"))
return decode_list
html = grabber.load_arbitrary_page(url)
count+=1
for line in html:
#Appears to show up 3 times, all in the same line
if "<my search parameter>" in line:
content_list.append(line)
break
最后,content_list
在评论(re.split("<!-- ...
)上分开,最后一部分变为变量json_set
。
答案 0 :(得分:4)
如果您查看ECMA-404 standard for JSON,您会看到数字可能有一个可选的前导减号,它们指定为U+002D,这是ASCII减号。但是,\u002D
不是减号。它是减号的字符转义符,但字符转义符仅在字符串值的上下文中有效。但字符串值必须以双引号开头和结尾,因此这不是字符串值。因此,您拥有的数据不会解析为有效的JSON值,并且Python JSON解析器在拒绝它时是正确的。
如果您尝试使用http://jsonlint.com/网站验证该数据blob,它还会报告该数据无效JSON。
Parse error on line 2172:
... "distance": \u002d1,
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['
您使用IDLE工作的示例并不是一个相同的比较,因为您提供的字符串不同:
'"distance":\u002d1' != '"distance":\\u002d1'
左边的字符串是你给IDLE的字符串,如果你把它括在花括号中,它就是有效的JSON:
>>> json.loads('{"distance":\u002d1}')
{'distance': -1}
但是如果你给它右边的字符串,你会发现它不会像你期望的那样工作:
>>> json.loads('{"distance":\\u002d1}')
Traceback (most recent call last):
File "/usr/lib/python3.2/json/decoder.py", line 367, in raw_decode
obj, end = self.scan_once(s, idx)
StopIteration
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/usr/lib/python3.2/json/__init__.py", line 309, in loads
return _default_decoder.decode(s)
File "/usr/lib/python3.2/json/decoder.py", line 351, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
File "/usr/lib/python3.2/json/decoder.py", line 369, in raw_decode
raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded