Python re.split和re.findall:分组和捕获

时间:2018-11-14 01:16:29

标签: python regex

我有类似"00:00:00 Segment 1 00:20:00 Segment 2 8:00:00 Segment 3""00:00 Segment 1 20:0 Segment 2"的字符串,并且想使用re.split()re.findall()查找所有时间戳和段名称。但是我很难在没有捕获效果的情况下实现可选组。这是我得到的:

str_1 = "00:00:00 Segment 1 00:20:00 Segment 2 8:00:00 Segment 3"
str_2 = "00:00 Segment 1 20:0 Segment 2"

re.findall(r'\d\d?:\d\d?:\d\d?', str_1)
=>  ['00:00:00', '00:20:00', '8:00:00']

re.split(r'\d\d?:\d\d?:\d\d?', str_1)
=> ['', ' Segment 1 ', ' Segment 2 ', ' Segment 3']

以上方法工作正常,但将无法处理str_2。如果我做了第三对数字,它只会返回可选的组

re.findall(r'\d\d?:\d\d?(:\d\d?)?', str_1)
=> [':00', ':00', ':00']

re.split(r'\d\d?:\d\d?(:\d\d?)?', str_1)
=> ['', ':00', ' Segment 1 ', ':00', ' Segment 2 ', ':00', ' Segment 3']

re.findall(r'\d\d?:\d\d?(:\d\d?)?', str_2)
=> ['', '']

re.split(r'\d\d?:\d\d?(:\d\d?)?', str_2)
=> ['', None, ' Segment 1 ', None, ' Segment 2']

但是,如果我在不捕获的情况下创建了可选组,则str_2可以正常工作,但是结果与str_1混合在一起

re.findall(r'\d\d?:\d\d?(?:\d\d?)?', str_1)
=> ['00:00', '00:20', '8:00']

re.split(r'\d\d?:\d\d?(?:\d\d?)?', str_1)
=> ['', ':00 Segment 1 ', ':00 Segment 2 ', ':00 Segment 3']

re.findall(r'\d\d?:\d\d?(?:\d\d?)?', str_2)
=> ['00:00', '20:0']

re.split(r'\d\d?:\d\d?(?:\d\d?)?', str_2)
=> ['', ' Segment 1 ', ' Segment 2']

我想找到一个在str_str_2上都能正常工作的正则表达式,这种正则表达式具有可选组,但没有捕获效果。无论如何要实现?

1 个答案:

答案 0 :(得分:0)

似乎您的模式中缺少:;您需要两个,一个用于?:,一个用于您的文字:,丙氨酸:

re.findall(r'\d\d?:\d\d?(?::\d\d?)?', str_1)
=> ['00:00:00', '00:20:00', '8:00:00']