Question

有一个我要从符号之间提取值的字符串，但是符号或定界符也恰好是字符串的一部分。

假设下面的字符串：

message =': :1:1st message:2a:2nd message:x:this is where it fails status: fail :3:3rd message'

和所需结果：

['1st message','2nd message','this is where it fails status: fail','3rd message']

当前代码和结果：

import re
def trans(text):
    text = text+':'
    tag = re.findall(r':(.*?):',text)
    return [i for i in tag if not i.isspace()]

trans(message)

>>['1st message', '2nd message', 'this is where it fails status', '3']

有什么想法可以构成正则表达式以同时包含将'status: fail '包含在结果中的模式吗？

Answer 1

尝试使用negative lookahead：r'[^\s]:(.*?):(?!\s)。

结果：

['1st message',
 '2nd message',
 'this is where it fails status: fail ',
 '3rd message']

[^\s]不能匹配冒号，该冒号前面带有空格字符，因此它可以修复3rd message。
:(?!\s)用来匹配不带空格字符的冒号，因此它可以修复status: fail。
换句话说，我添加的两段代码都在要匹配的子字符串周围创建了一个空白，该空白不能由在冒号之前或之后的冒号组成。

Answer 2

您可以使用

re.findall(r'(?<=:\S:).+?(?=\s*:.:|$)', message)

在冒号（或字符串的开头）中寻找一个字符，然后匹配并延迟重复任何字符，直到前瞻在冒号（或字符串的末尾）中看到另一个字符。

输出：

['1st message', '2nd message', 'this is where it fails status: fail', '3rd message']

Answer 3

尝试使用正则表达式：:\d+:\K.*?(?=:\d+|$)

Demo

通过正则表达式在符号之间查找值，其中符号可能是值的一部分

3 个答案: