Question

有没有办法在python中没有任何顺序的字符串中匹配子字符串？

假设我有一个字符串

Hello how are you doing you have a nice day hello there

我的匹配子字符串是＆＃39;你好＆＃39;，＆＃39;你＆＃39;。

现在我需要一个匹配hello how are you和you doing you(already matched shouldn't match again) have a nice day hello

的正则表达式模式

我尝试过类似的东西，但没有工作

(hello|you)[\w\s]*?[^($1)](hello|you)

预期产出：

Hello how are you
you doing you have a nice day hello
you have a nice day hello

基本上我想匹配＆＃39;你好......你＆＃39;反之亦然＆＃39;你......你好＆＃39;

我不知道如何摆脱第一个匹配的模式。有什么想法来解决这个问题吗？

更新：

基本上我的问题是我需要匹配一个没有重复子字符串的字符串。在上面，句子是＆＃34;你好，你好，你在那里度过了愉快的一天＆＃34;匹配字符串是＆＃34;你好＆＃34;，＆＃34;你＆＃34;因此我需要匹配一个以hello开头并以你结束的子字符串，或者以你开头并以hello结尾，不是以你开头并以你的方式结束而是以hello的方式结束

Answer 1

将此模式与re.findall：

一起使用

(?si)(?=((?:hello|you).*?(?:hello|you)))

请参阅regex demo

Python demo：

import re
p = re.compile(r'(?=((?:hello|you).*?(?:hello|you)))', re.IGNORECASE | re.DOTALL)
test_str = "Hello how are you doing you have a nice day hello there"
print(p.findall(test_str))
# => ['Hello how are you', 'you doing you', 'you have a nice day hello']

正则表达式解释：

(?si) - 启用dotall（.也匹配换行符）和ignorecase标志
(?=((?:hello|you).*?(?:hello|you))) - 一个积极的前瞻，它不消耗字符，但允许在re.findall字符串中的每个位置捕获子字符串。它搜索：
- (?:hello|you) - 文字字符序列hello或you
- .*? - 任意字符，0或更多次出现，尽可能少
- (?:hello|you) - 文字字符序列hello或you

要仅将hello和you作为整个字词匹配，您可以添加字词边界\b：

(?si)(?=(\b(?:hello|you)\b.*?\b(?:hello|you)\b))
         ^^             ^^   ^^             ^^

Answer 2

根据我对你的问题的理解，这可能是你想要的：

t = "Hello how are you doing you have a nice day hello there"
pattern = ["(?=hello).*?(?<=you)","(?=you).*?(?<=hello)"]
for p in pattern:
  pat = re.compile(p)
  for m in pat.finditer(t.lower()):
    print m.group()

输出是：

你好，你好吗？你做得很开心你好吗

匹配字符串的子字符串，没有任何顺序

2 个答案: