Question

我想制作一个捕获第一个内部对词的正则表达式。我的代码在一个条件下工作但在另一个条件下不起作用。它捕获最后一对单词。

请参阅下面的代码。

Value of a is 'index'
Value of a is 'index.html'
Value of a is 'index2.html'
Value of a is 'index3.html'

输出正在跟随。

def testReplaceBetweenWords():

    head_dlmt='Head'
    tail_dlmt='Tail'

    line0 = "abc_Head_def_Head_inner_inside_Tail_ghi_Tail_jkl"
    line1 = "abc_Head_first_Tail_ghi_Head_second_Tail_opq"

    between_pattern = "(^.*(?<={0}))(?!.*{0}).*?(?={1})(.*)$".format(head_dlmt, tail_dlmt)
    compiled_pattern = re.compile(between_pattern)

    # Case 0: good case: It captures the first inner words.    
    result0 = re.search(compiled_pattern, line0)  

    print("original 0    : {0}".format(result0.group(0)))
    print("expected Head : abc_Head_def_Head")
    print("found Head    : {0}".format(result0.group(1)))
    print("expected Tail :                                Tail_ghi_Tail_jkl")
    print("found Tail    : {0}{1}".format(' ' * (result0.regs[2][0]), result0.group(2)))

    print()

    # Case 1: Bad case: It captures the last pair words.    
    result1 = re.search(compiled_pattern, line1)

    print("original 1    : {0}".format(result1.group(0)))
    print("expected Head : abc_Head")
    print("found Head    : {0}".format(result1.group(1)))
    print("expected Tail :                Tail_ghi_Head_second_Tail_opq")
    print("found Tail    : {0}{1}".format(' ' * (result1.regs[2][0]), result1.group(2)))

第一种情况很有效。它捕获了第一个内部对词。第二种情况不起作用。它捕获了最后一对单词，但我预计第一对单词。如何制作满足上述两种情况的常规快递？

非常感谢。

Answer 1

使用以下正则表达式：

between_pattern = "^((?:(?!{1}).)*{0}).*?({1}.*)$".format(head_dlmt, tail_dlmt)

请参阅online Python demo和regex demo。

<强>详情

第一个.*模式应该替换为一个驯化的贪婪令牌(?:(?!{1}).)*，它匹配任何不启动结束分隔符字符序列的0 +字符（因此，您可以直到最后Head 1}}不包含Tail）
在捕获组中使用外观是没有意义的，因为这些模式将成为捕获组的一部分

注意您可能希望使用re.S标志编译正则表达式以支持带换行符的字符串。

Answer 2

另一个选项可能只是匹配（几乎）你想要匹配的内容：

使用此正则表达式，并提取第一个匹配项：

(?<=Head)(?:(?!Head|Tail).)+(?=Tail)

在您的情况下，请使用：

between_pattern = '(?<={0})(?:(?!{0}|{1}).)+(?={1})'.format(head_dlmt, tail_dlmt)

更多：使用这个正则表达式，你可以提取第二个，第三个......第n个，就像提取第一个一样简单，而且根本没有任何变化：它更灵活。

见这里：

https://regex101.com/r/ds90y4/1/

如何创建一个找到第一个内部对词的正则表达式？

2 个答案: