Question

The idea here is to group log messages better from our application.

So imagine the regex:

(^Case1|^Case2|Case3$)

And the strings:

teststring Case3
Case1 teststring
Case2
teststring

The expected capture results would be

Case3
Case1
Case2

(and nothing for the last one)

Since i dont want to discard messages that I don't specifically define, I'd like to capture the entire string if the string doesnt match any of the cases defined in the regex.

In naivety, I changed the code to:

(^Case1|^Case2|Case3$|.*)

However the last capture group now seems to override the capture done by the other groups, and is always the one that gets evaluated....so i always match the entire string....unless the text to match is at the start of the string.

eg using:

(^Case1|^Case2|Case3$|.*)

on

Case2 testString

gives

Case2

but using

(^Case1|^Case2|Case3$|.*)

on

teststring Case3

gives

teststring Case3

Hopefully someone will enlighten me!

Thanks in advance.

Answer 1

编辑注释 - 需要注意的事项..

正则表达式从左到右交替处理但是，它在当前字符位置处理例如，在此表达式(here$)|.*中，首先检查here$ 字符位置0，主题字符串为'first here'，'f'为首先检查here$中的'h'。没有比赛..

因此，它转到下一个交替表达式.*，它可以匹配'f'
并匹配字符串的结尾。

即使主题字符串末尾包含“... here”，也不会匹配在这种情况下。

但是，如果您使用此正则表达式.*(here$)|.*，则第一个.*(here$)会匹配，因为 'f'可以一直匹配到最后的'here'。

从技术上讲，你想知道哪个案例匹配，同时又是匹配所有其他文本。

如果是这样，有很多方法，这里有两个。

这使用分支重置。

 # ^(?|(Case[12]).*|.*(Case3)|(.+))$

 ^ 
 (?|
      ( Case [12] )                 # (1)
      .* 
   |  
      .* 
      ( Case3 )                     # (1)
   |  
      ( .+ )                        # (1)
 )
 $

这使用个别捕获组来具体告诉您哪种情况
匹配。

 # ^(?:(Case[12]).*|.*(Case3)|(.+))$

 ^ 
 (?:
      ( Case [12] )                 # (1)
      .* 
   |  
      .* 
      ( Case3 )                     # (2)
   |  
      ( .+ )                        # (3)
 )
 $

Matching specific parts of a string or else matching entire string

1 个答案: