将行与特定字符串匹配以提取值Python Regex

时间:2019-05-21 00:01:52

标签: python regex

在为该任务找到正确的正则表达式时遇到了一些问题,请问我的初学者技能是什么。我想做的是仅从其“可用”:true而不是“可用”:false的行中获取id值。我可以通过re.findall('"id":(\d{13})', line, re.DOTALL)获得所有行的ID(13就是正好匹配13位数字,因为代码中还有其他ID少于13位我不需要)。

{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},

因此最终结果必须为['1651572973431','1351572943231']

感谢您的大力帮助

3 个答案:

答案 0 :(得分:3)

这可能不是一个很好的答案,这取决于您所拥有的。它看起来像,就像您有一个字符串列表,并且您希望从其中的某些字符串获得ID。如果真是这样,那么如果您解析JSON而不是编写拜占庭式正则表达式,它将更加整洁且易于阅读。例如:

import json

# lines is a list of strings:

lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]

# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]

结果

[1351572943231, 1651572973431]

如果您发布的代码是一个长字符串,则可以将其包装在[]中,然后将其解析为具有相同结果的数组:

import json

line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'

lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]

答案 1 :(得分:2)

这可以满足您的需求

(?<="id":)\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)

https://regex101.com/r/FseimH/1

扩展

 (?<= "id": )
 \d{13} 
 (?=
      (?: ," [^"]* ": [^,]*? )*?
      ,"available":true
 )

解释

 (?<= "id": )                        # Lookbehind assertion for id
 \d{13}                              # Consume 13 digit id
 (?=                                 # Lookahead assertion
      (?:                                 # Optional sequence
           ,                                   # comma
           " [^"]* "                           # quoted string
           :                                   # colon
           [^,]*?                              # optional non-comma's
      )*?                                 # End sequence, do 0 to many times - 
      ,"available":true                   # until we find  available = true
 )

答案 2 :(得分:1)

在这里,我们可以简单地使用“ id”作为左边界,并在捕获组中收集所需的数字:

"id":([0-9]+)

enter image description here

然后,我们可以继续为其添加边界。例如,如果需要13位数字,我们可以简单地:

\"id\":([0-9]{13})