python中的负向lookbehind正则表达式断言

时间:2013-12-07 14:14:45

标签: python regex

我正在开发一个具有搜索功能的应用程序,我想在其中匹配搜索模式。模式可以具有以下形式:

  • search:'pattern'search:"pattern"(引用搜索)
  • search:r'pattern'search:r"pattern"(正则表达式搜索)
  • search:pattern(不带引号的搜索)

我的正则表达式是:

quoted = re.compile(r'search:(?:\'|")([^"\']+)')
regex = re.compile(r'search:r(?:\'|")([^"\']+)')
unquoted = re.compile(r'search:(?<!r[\'"])([^ \'"]+)')

我的测试字符串是

test_str = "search:foo search:'bar' search:\"baz\" search:r'blah' search:r\"bleh\""

引用和正则表达式模式是正确匹配的,但是不带引号的模式(应该只匹配foo)不正确匹配,它的行为就像负面的lookbehind不存在一样。我还尝试从断言中删除引号([\'"]),但它返回完全相同的结果:

>>> unquoted.findall(test_str)
['foo', 'r', 'r']

我不明白我在这里做错了什么,所以非常感谢任何帮助!

1 个答案:

答案 0 :(得分:1)

'search:(?<!r[\'"])([^ \'"]+)'中的lookbehind断言从h:序列后面的位置看后面,因此它永远不会发现h:r'r"
替换为(?!r[\'"])

但我发现另一个问题:

import re

quoted = re.compile(r'search:(?:[\'"])([^"\']+)')
regex = re.compile(r'search:r(?:[\'"])([^"\']+)')
unquoted = re.compile(r'search:(?!r[\'"])([^ \'"]+)')

test_str = "search:foo search:romeo "\
           "search:'bar' search:\"baz\" "\
           "search:r'blah' search:r\"bleh\""\
           "search:isn'it something to catch ?"

"""
•search:'pattern' and search:"pattern" (quoted search)
•search:r'pattern' and search:r"pattern" (regex search)
•search:pattern (unquoted search)

"""
print quoted.findall(test_str)
print
print regex.findall(test_str)
print
print unquoted.findall(test_str)

结果

['bar', 'baz']

['blah', 'bleh']

['foo', 'romeo', 'isn']

您不想抓住isn'it吗?