Question

我看过很多标题相似的帖子，但我发现没有什么能与python甚至是这个网站一起使用：https://regex101.com

如何匹配除特定文本以外的所有内容？

我的文字：

1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

1234_This is a text Word BCD    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            -

我希望匹配Word \w+，然后匹配其余的1234。所以结果应该是（返回()中标记的组）：

(1234_This is a text (Word AB))(

Protocol  Address          ping
Internet  1.1.1.1            - 
Internet  1.1.1.2            25 
Internet  1.1.1.3            8 
Internet  1.1.1.4            - 

)(1234_This is a text (Word BCD)(    
Protocol  Address          ping
Internet  2.2.2.1            10 
Internet  2.2.2.2            - )

第一部分很简单：matches = re.findall(r'1234_This is a text (Word \w+)', var) 但接下来的部分我无法实现。我尝试过阴性前瞻： ^(?!1234)然后它再也不匹配......

Answer 1

代码

See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234)[\s\S])*)

使用s修饰符可以使用以下内容 See regex in use here

(1234[\w ]+(Word \w+))((?:(?!1234).)*)

说明

(1234[\w ]+(Word \w+))将以下内容捕获到捕获组1中
- 1234按字面意思匹配
- [\w ]+匹配一个或多个单词字符或空格
- (Word \w+)将以下内容捕获到捕获组2中
  - Word按字面意思匹配（请注意尾随空格）
  - \w+匹配任何单词字符一次或多次
((?:(?!1234)[\s\S])*)将以下内容捕获到捕获组2中
- (?:(?!1234)[\s\S])*符合以下任意次数（tempered greedy token）
  - (?!1234)否定前瞻确保后续内容不匹配
  - [\s\S])*多次匹配任何字符

Answer 2

正如你所说：

我希望匹配Word \ w +，然后将其余的匹配到下一个1234.

你想要这样的东西吗？

import re
pattern=r'((1234_This is a text) (Word\s\w+))((\n?.*(?!\n\n))*)'
string="""1234_This is a text Word AB

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            -

1234_This is a text Word BCD
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -"""

match=re.finditer(pattern,string,re.M)
for find in match:
    print("this is group_1 {}".format(find.group(1)))
    print("this is group_3 {}".format(find.group(3)))




    print("this is group_4 {}".format(find.group(4)))

输出：

this is group_1 1234_This is a text Word AB
this is group_3 Word AB
this is group_4 

Protocol  Address          ping
Internet  1.1.1.1            -
Internet  1.1.1.2            25
Internet  1.1.1.3            8
Internet  1.1.1.4            
this is group_1 1234_This is a text Word BCD
this is group_3 Word BCD
this is group_4 
Protocol  Address          ping
Internet  2.2.2.1            10
Internet  2.2.2.2            -

匹配除特定字符串外的所有内容

2 个答案:

代码

说明