json.loads()给出了它期望值的异常,看起来就像值一样

时间:2015-05-16 00:44:09

标签: python json python-3.x unicode

代码:

loaded_json = json.loads(json_set)

json_set是从网页收集的字符串,它是JSON格式的数据。完整字符串(警告:LONG)位于:http://pastebin.com/wykwNEeg

它给我的错误(如果我将字符串保存到自己的文件中,并且readlines + json.loads IDLE中的那行):

    Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.4/json/__init__.py", line 318, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.4/json/decoder.py", line 343, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.4/json/decoder.py", line 361, in raw_decode
    raise ValueError(errmsg("Expecting value", s, err.value)) from None
ValueError: Expecting value: line 1 column 62233 (char 62232)
","distance":\u002d1,"lo
            ^(gedit tells me column 62233 lies between the colon and the \

我猜它与unicode有关,这个特殊的是-的unicode,所以这个值应该是"distance":-1

奇怪的是,如果我在遇到异常时(或者我猜哪里)打印出来的话,就会出现如上所述。但是,如果我打开python3 IDLE会话并执行此操作,我会得到不同的结果:

>>> mystr = '"distance":\u002d1'
>>> mystr
'"distance":-1'
>>> print(mystr)
"distance":-1
>>>  

如何正确加载此JSON?

===============

此数据来自早期的代码(基本上显示该字符串是response.decode('utf8')的结果):

'''This bit gets the page from the website, it's called from the below code block'''
def load_arbitrary_page(self, url):
    response = self.opener.open(url)
    response_list = response.readlines()
    decode_list = []
    for line in response_list:
        decode = line.decode('utf8')
        decode_list.append(decode)  

    print(BeautifulSoup(''.join(decode_list)).find("title"))

    return decode_list
    html = grabber.load_arbitrary_page(url)
    count+=1
    for line in html:
        #Appears to show up 3 times, all in the same line
        if "<my search parameter>" in line:
            content_list.append(line)
            break

最后,content_list在评论(re.split("<!-- ...)上分开,最后一部分变为变量json_set

1 个答案:

答案 0 :(得分:4)

如果您查看ECMA-404 standard for JSON,您会看到数字可能有一个可选的前导减号,它们指定为U+002D,这是ASCII减号。但是,\u002D不是减号。它是减号的字符转义符,但字符转义符仅在字符串值的上下文中有效。但字符串值必须以双引号开头和结尾,因此这不是字符串值。因此,您拥有的数据不会解析为有效的JSON值,并且Python JSON解析器在拒绝它时是正确的。

如果您尝试使用http://jsonlint.com/网站验证该数据blob,它还会报告该数据无效JSON。

Parse error on line 2172:
...        "distance": \u002d1,           
-----------------------^
Expecting 'STRING', 'NUMBER', 'NULL', 'TRUE', 'FALSE', '{', '['

您使用IDLE工作的示例并不是一个相同的比较,因为您提供的字符串不同:

'"distance":\u002d1' != '"distance":\\u002d1'

左边的字符串是你给IDLE的字符串,如果你把它括在花括号中,它就是有效的JSON:

>>> json.loads('{"distance":\u002d1}')
{'distance': -1}

但是如果你给它右边的字符串,你会发现它不会像你期望的那样工作:

>>> json.loads('{"distance":\\u002d1}')
Traceback (most recent call last):
  File "/usr/lib/python3.2/json/decoder.py", line 367, in raw_decode
    obj, end = self.scan_once(s, idx)
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib/python3.2/json/__init__.py", line 309, in loads
    return _default_decoder.decode(s)
  File "/usr/lib/python3.2/json/decoder.py", line 351, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.2/json/decoder.py", line 369, in raw_decode
    raise ValueError("No JSON object could be decoded")
ValueError: No JSON object could be decoded