Question

我想在较大字符串中从某个位置开始搜索正则表达式匹配，而不使用字符串切片。

我的背景是我想迭代搜索字符串以查找各种正则表达式的匹配。 Python中的一个自然解决方案是跟踪字符串中的当前位置并使用例如

re.match(regex, largeString[pos:])

循环。但是对于非常大的字符串（~1MB），largeString[pos:]中的字符串切片变得昂贵。我正在寻找一种方法来解决这个问题。

附注：有趣的是，在Python documentation的一个利基中，它讨论了匹配函数的可选pos参数（这正是我想要的），这是找不到的。功能本身： - ）。

Answer 1

具有pos和endpos参数的变体仅作为正则表达式对象的成员存在。试试这个：

import re
pattern = re.compile("match here")
input = "don't match here, but do match here"
start = input.find(",")
print pattern.search(input, start).span()

...输出(25, 35)

Answer 2

pos关键字仅在方法版本中可用。例如，

re.match("e+", "eee3", pos=1)

无效，但

pattern = re.compile("e+")
pattern.match("eee3", pos=1)

作品。

Answer 3

>>> import re
>>> m=re.compile ("(o+)")
>>> m.match("oooo").span()
(0, 4)
>>> m.match("oooo",2).span()
(2, 4)

Answer 4

你也可以使用积极的lookbehinds，如：

import re

test_string = "abcabdabe"

position=3
a = re.search("(?<=.{" + str(position) + "})ab[a-z]",test_string)

print a.group(0)

的产率：

abd

在不使用字符串切片的情况下将正则表达式应用于子字符串

4 个答案: