使用python正则表达式的json字符串的extarct particulr部分

时间:2016-06-06 14:06:49

标签: json regex python-2.7

我有以下json字符串:

"{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":" {\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]}","form":{"q":"","city":"Delhi","locality":null},"cerebro":true}"

我想从上面的字符串中提取列表部分:

{\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]}

如何使用python regex执行此操作?

1 个答案:

答案 0 :(得分:1)

您的JSON中存在问题,它在双引号中包含另一个json对象,导致json.loads失败。在传递给json.loads之前尝试对json字符串进行一些转换。

以下作品完美无缺。

>>> p = json.loads('''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}''')

然后将所请求的部分提取为

>>> p["list"]
{u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}

检查一下我可以设法纠正你提供的json。

>>> p = '''{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":" {\"id\":205782,\"name\":\"Robert Shriwas\",\"gender\":\"F\",\"practicing_since\":null,\"years\":21,\"specializations\":[\"Mentor\"]}","form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'''
>>> q = re.sub(r'(:)\s*"\s*(\{[^\}]+\})\s*"',r'\1\2', p[1:-1])
>>> q
'"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true'
>>> r = p[0] + q + p[-1]
>>> r
'{"sweep_enabled":true,"product":"XYZ","page":"XYZ Profile","list":{"id":205782,"name":"Robert Shriwas","gender":"F","practicing_since":null,"years":21,"specializations":["Mentor"]},"form":{"q":"","city":"Delhi","locality":null},"cerebro":true}'
>>> json.loads(r)
{u'product': u'XYZ', u'form': {u'q': u'', u'city': u'Delhi', u'locality': None}, u'sweep_enabled': True, u'list': {u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}, u'cerebro': True, u'page': u'XYZ Profile'}
>>> s = json.loads(r)
>>> s['list']
{u'name': u'Robert Shriwas', u'gender': u'F', u'specializations': [u'Mentor'], u'id': 205782, u'years': 21, u'practicing_since': None}
>>>