我有下面的字符串,我有兴趣提取粗体文本。 *请注意,我需要的ID必须连接名称Erik:
"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{**"id":8**,"date":"2015-01-29 07:39:21","name":"Erik"}
我正在使用正则表达式("id":)(\d+)(,"date":)(.*)(,"name":"Erik")
,但这会返回给我id:1
有没有办法只获取与名称erik相关联的ID?
答案 0 :(得分:0)
使用否定字符类[^,]*
而不是.*
,因为默认情况下.*
是贪婪的,并且它尽可能匹配所有字符。非贪婪的正则表达式.*?
也不会在这里工作,因为.
也匹配字符逗号。此[^,]*
匹配任何字符,但不匹配逗号,零次或多次。
("id":)(\d+)(,"date":)([^,]*)(,"name":"Erik")
删除额外的捕获组以仅返回id。
>>> s = '"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{"id":8,"date":"2015-01-29 07:39:21","name":"Erik"}'
>>> re.findall(r'"id":(\d+),"date":[^,]*,"name":"Erik"', s)
['8']
答案 1 :(得分:0)
更简单的方法是反序列化JSON结构,并提取相关信息。如,
import json
data = '''{"startRow":0,"endRow":1,"totalRows":2,"rowsReturned":2,"test":[{"id":1,"date":"2015-01-28 12:06:24","name":"first"},{"id":8,"date":"2015-01-29 07:39:21","name":"Erik"}]}'''
data = json.loads(data)
for row in data['test']:
print("ID: {}".format(row['id']))
print("Date: {}".format(row['date']))
print("Name: {}".format(row['name']))