Question

我有如下文字-

02052020 02:40:02.445: Vacation Allowance: 21; nnnnnn Vacation Allowance: 22;nnn

我想在Python中提取以下内容-

Vacation Allowance: 21
Vacation Allowance: 22

基本上，我想提取所有出现的“假期津贴：”，并在其后加上;后面的数值；

我正在使用以下正则表达式-

(.*)(Vacation Allowance:)(.*);(.*)

下面的完整Python代码-

import re

text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'

pattern = re.compile(r'(.*)(Vacation Allowance:)(.*);(.*)')

for (a,b,c,d) in re.findall(pattern, text):
    print(b, " ", c)

这并不能全部给出所有出现次数，而仅给出最后一次出现。当前输出是-

Vacation Allowance: 22

您能否评论我如何提取所有事件？

Answer 1

在Javascript中为'text'.match(/\bVacation Allowance: \d+/g)

您需要全局属性g

Answer 2

问题在于所使用的正则表达式。 (.*)块接受的字符串比您想象的要多-.*被称为贪婪操作，它将在匹配时消耗尽可能多的字符串。这就是为什么您只看到一个输出的原因。

建议匹配Vacation Allowance:\s*\d+;之类的东西。

text = '02/05/2020 Vacation Allowance: 21; 02/05/2020 Vacation Allowance: 22; nnn'
m = re.findall('Vacation Allowance:\s*(\d*);', text, re.M)
print(m)

结果：['21', '22']

正则表达式返回所有匹配项

2 个答案: