Question

我目前正在使用(['\"])(?:\\1|.*?\\1)来捕获一组引号。

Text: "Hello", is it 'me youre looking for'?
# result: "Hello" (\1) and 'me youre looking for' (\2)

此外，我希望它忽略这些组内的转义引号（或全局，也很好）。

Text: "Hello", is it 'me you\'re looking for'?
# result: "Hello" (\1) and 'me you\'re looking for' (\2)

使用python。我知道this questions有些相似。但是，我无法将其应用于现有的正则表达式。

谢谢，正则表达式怪胎！

Answer 1

这是一种模式：

(['"])(?:\\.|.)*?\1

Demo

Everything位于(?:\\.|.)位：

要么匹配转义字符：\\. - 这会同时处理\"和\\
或任何其他（读取：未转义）字符：. - 您也可以在此使用[^\\]。

由于正则表达式引擎从左到右尝试交替，因此它首先尝试匹配转义字符。

顺便说一句，在你的模式中，\1|.*?\1是多余的，你可以写.*?\1。

Answer 2

您可以使用以下正则表达式。

(?<!\\)(['"])(?:\\\1|(?!\1).)*\1

DEMO

(?<!\\)负面的lookbehind，断言匹配不会以反斜杠字符开头。
(['"])这会捕获未转义的单引号或双引号。
(?:\\\1|(?!\1).)*，\\\1根据捕获的字符或任何字符而不是捕获字符的零，匹配转义的'或"引号，零或更多次。
\1指的是第一个捕获的角色。

在python中，您需要更改re.findall函数，如下所示。

>>> def match(s):
        for i in re.findall(r'''(?<!\\)((['"])(?:\\\2|(?!\2).)*\2)''', s):
            print(i[0])


>>> match(r""""Hello", is it 'me you\'re looking for'""")
"Hello"
'me you\'re looking for'
>>> match(r"""Hello\", is it 'me you\'re looking for'""")
'me you\'re looking for'
>>>

组引用并忽略转义引号

2 个答案: