Question

我有一个字符串，我想使用正则表达式从中提取匹配项。字符串如下：

you and he and she and me

我的正则表达式（到目前为止）：

(\w+) and (\w+)

我想要的是它应该给出这个结果：

(you, he), (he, she), (she, me)

但是当前结果只包含2个匹配，即

(you, he), (she, me)

如何实现这一目标？

Answer 1

您要求的是overlapping regexes。

您就是这样做的：

import re                                                                       

s = "you and he and she and me"                                                 

print re.findall(r'(?=\b(\w+) and (\w+)\b)', s)

事实上它在寻找重叠方面做得很好，你需要我添加的\b来表示你想要匹配单词边界。否则你得到：

[('you', 'he'), ('ou', 'he'), ('u', 'he'), ('he', 'she'), ('e', 'she'), ('she', 'me'), ('he', 'me'), ('e', 'me')]

Answer 2

您可以使用零宽度正向前瞻：

(?=(?:^|\s)(\w+)\s+and\s+(\w+))

零宽度前瞻模式以(?=开头，最后为)
(?:^|\s)是一个未捕获的组，确保所需的模式位于开头或后跟空格
(\w+)\s+and\s+(\w+)，使用第一个和第二个捕获的组获得所需的模式

示例：

In [11]: s = 'you and he and she and me' In [12]: re.findall(r'(?=(?:^|\s)(\w+)\s+and\s+(\w+))', s) Out[12]: [('you', 'he'), ('he', 'she'), ('she', 'me')]

Answer 3

正如其他人指出的那样，你所寻找的是重叠匹配使用较新的regex module，您可以坚持使用初始方法并应用另一个标记：

import regex as re

string = "you and he and she and me"
rx = r'\b(\w+) and (\w+)\b'

matches = re.findall(rx, string, overlapped=True)
print matches
# [('you', 'he'), ('he', 'she'), ('she', 'me')]

提示：你需要在顶部（\b）加上单词边界，否则你会得到意想不到的结果。

来自同一个单词的多次点击

3 个答案: