Python Regex具有重复的字段名称但值不同

时间:2015-01-29 16:16:44

标签: python regex

我有下面的字符串,我有兴趣提取粗体文本。 *请注意,我需要的ID必须连接名称Erik:

"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{**"id":8**,"date":"2015-01-29 07:39:21","name":"Erik"}

我正在使用正则表达式("id":)(\d+)(,"date":)(.*)(,"name":"Erik"),但这会返回给我id:1

有没有办法只获取与名称erik相关联的ID?

2 个答案:

答案 0 :(得分:0)

使用否定字符类[^,]*而不是.*,因为默认情况下.*是贪婪的,并且它尽可能匹配所有字符。非贪婪的正则表达式.*?也不会在这里工作,因为.也匹配字符逗号。此[^,]*匹配任何字符,但不匹配逗号,零次或多次。

("id":)(\d+)(,"date":)([^,]*)(,"name":"Erik")

DEMO

删除额外的捕获组以仅返回id。

>>> s = '"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{"id":8,"date":"2015-01-29 07:39:21","name":"Erik"}'
>>> re.findall(r'"id":(\d+),"date":[^,]*,"name":"Erik"', s)
['8']

答案 1 :(得分:0)

更简单的方法是反序列化JSON结构,并提取相关信息。如,

import json

data = '''{"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{"id":8,"date":"2015-01-29 07:39:21","name":"Erik"}]}'''
data = json.loads(data)

for row in data['test']:
    print("ID: {}".format(row['id']))
    print("Date: {}".format(row['date']))
    print("Name: {}".format(row['name']))