re.findall忽略两个模式之间的一些变量

时间:2019-07-02 21:41:10

标签: python

我正在尝试查找句子“ DELETED-LVHEAP = 258/64806/65937 RSS = 66621”,需要将其标识为“ --LVHEAP”,然后在找到所有这些句子之后,我想输出“ 66621”。

我用过:

text ="DELETED -- LVHEAP = 258/64806/65937  RSS = 66621"

RSS = re.findall("(?<=-- LVHEAP = )\d+\\S+\\S+(?<=RSS =)\d+",text)

它的输出为空,有人可以帮我吗?

2 个答案:

答案 0 :(得分:1)

我怀疑您打算让原始正则表达式中的\S匹配非空格字符,但是\\的意思是“ match \”,这导致S的意思是只是一个文字“ S”,因为之前的\\\占用了。

但是即使您修复了该问题,原始正则表达式也存在其他问题。这是一个更简单的匹配您对要做什么的描述的

-- LVHEAP = [\d/]+  RSS = (\d+)

这意味着:

-- LVHEAP =    a line containing "-- LVHEAP =  "
[\d/]+         followed by one or more digits and '/' slashes
  RSS =        followed by "  RSS = "
(\d+)          followed by one or more digits, which are captured

请参见https://regex101.com/r/LNuF5K/1

更简单的正则表达式可以工作,例如:

-- LVHEAP = [A-Z\d/= ]+ (\d+)

例如,“ RSS”可能是其他全大写字母。

答案 1 :(得分:0)

这样对您有用吗?

import re

outputs = []
for line in lines:
    if "-- LVHEAP" in line:
        matches = re.findall("RSS = \d+", line)
        matches = [ int(match.split(" = ")[1]) for match in matches ]
        outputs.append(matches)