如何用Python中的字符串中的双引号解析JSON文件?

时间:2014-04-16 11:21:38

标签: python json

我正在尝试用Python读取JSON文件。有些行包含带双引号的字符串:

{"Height__c": "8' 0\"", "Width__c": "2' 8\""}

使用原始字符串文字产生正确的输出:

json.loads(r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}""")
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

但是我的字符串来自一个文件,即:

s = f.readline()

其中:

>>> print repr(s)
'{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'

json抛出以下异常:

json.loads(s) # s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
ValueError: Expecting ',' delimiter: line 1 column 21 (char 20)

此外,

>>> s = """{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)

失败,但是分配原始文字作品:

>>> s = r"""{"Height__c": "8' 0\"", "Width__c": "2' 8\""}"""
>>> json.loads(s)
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}

我是否需要编写自定义解码器?

1 个答案:

答案 0 :(得分:1)

您拥有的数据文件正确地转义嵌套引号;这可能很难修复。

如果嵌套引号遵循模式;例如总是跟随一个数字,并且是每个字符串中的最后一个字符,您可以使用正则表达式来修复它们。根据您的样本数据,如果您只有英尺和英寸的测量值,那肯定是可行的:

import re
from functools import partial

repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')

json.loads(repair_nested(s))

演示:

>>> import json
>>> import re
>>> from functools import partial
>>> s = '{"Height__c": "8\' 0"", "Width__c": "2\' 8""}'
>>> repair_nested = partial(re.compile(r'(\d)""').sub, r'\1\\""')
>>> json.loads(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/__init__.py", line 338, in loads
    return _default_decoder.decode(s)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 365, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/json/decoder.py", line 381, in raw_decode
    obj, end = self.scan_once(s, idx)
ValueError: Expecting , delimiter: line 1 column 21 (char 20)
>>> json.loads(repair_nested(s))
{u'Width__c': u'2\' 8"', u'Height__c': u'8\' 0"'}