Question

我正在尝试构建一个正则表达式，它匹配两个正斜杠之间的正则表达式。我的主要问题是正则表达式本身可以包含正斜杠，由反斜杠转义。我尝试用负面的lookbehind断言过滤掉它们（如果当前位置没有反弹，只匹配结束斜线），但是，现在我遇到的问题是我没有得到匹配，如果正则表达式本身实际上以反斜杠为结尾。

测试程序：

#!/usr/bin/python
import re
teststrings=[
     """/hello world/""", 
     """/string with foreslash here \/ and here\//""",
     """/this one ends with backlash\\\\/"""]

patt="""^\/(?P<pattern>.*)(?<!\\\\)\/$"""

for t in teststrings:
    m=re.match(patt,t)
    if m!=None:
        print t,' => MATCH'
    else:
        print t," => NO MATCH"

输出：

/hello world/  => MATCH
/string with foreslash here \/ and here\//  => MATCH
/this one ends with backlash\\/  => NO MATCH

如果在当前位置只有一个反弹，而不是两个反弹，我如何修改断言？

还是有更好的方法来提取正则表达式？（注意，在实际文件中，我尝试解析行不仅仅包含正则表达式。我不能简单地搜索每行的第一个和最后一个斜杠，并将其中的所有内容都放在其中。）

Answer 1

试试这个：

pattern = re.compile(r"^/(?:\\.|[^/\\])*/")

<强>解释

^       # Start of string
/       # Match /
(?:     # Match either...
 \\.    # an escaped character
|       # or
 [^/\\] # any character except slash/backslash
)*      # any number of times.
/       # Match /

对于你的“真实世界”应用程序（找到第一个“斜线分隔的字符串”，忽略转义斜杠），我会使用

pattern = re.compile(r"^(?:\\.|[^/\\])*/((?:\\.|[^/\\])*)/")

这将为您提供以下内容：

>>> pattern.match("foo /bar/ baz").group(1)
'bar'
>>> pattern.match("foo /bar\/bam/ baz").group(1)
'bar\\/bam'
>>> pattern.match("foo /bar/bam/ baz").group(1)
'bar'
>>> pattern.match("foo\/oof /bar\/bam/ baz").group(1)
'bar\\/bam'

正则表达式：如果字符串本身包含转义斜杠，则在两个斜杠之间匹配字符串

1 个答案: