Matching specific parts of a string or else matching entire string

时间:2015-07-08 15:43:49

标签: regex sorting expression character

The idea here is to group log messages better from our application.

So imagine the regex:

(^Case1|^Case2|Case3$)

And the strings:

  • teststring Case3
  • Case1 teststring
  • Case2
  • teststring

The expected capture results would be

  • Case3
  • Case1
  • Case2

(and nothing for the last one)

Since i dont want to discard messages that I don't specifically define, I'd like to capture the entire string if the string doesnt match any of the cases defined in the regex.

In naivety, I changed the code to:

(^Case1|^Case2|Case3$|.*)

However the last capture group now seems to override the capture done by the other groups, and is always the one that gets evaluated....so i always match the entire string....unless the text to match is at the start of the string.

eg using:

(^Case1|^Case2|Case3$|.*)

on

  • Case2 testString

gives

  • Case2

but using

(^Case1|^Case2|Case3$|.*)

on

  • teststring Case3

gives

  • teststring Case3

Hopefully someone will enlighten me!

Thanks in advance.

1 个答案:

答案 0 :(得分:1)

编辑注释 - 需要注意的事项..

正则表达式从左到右交替处理 但是,它在当前字符位置处理 例如,在此表达式(here$)|.*中,首先检查here$ 字符位置0,主题字符串为'first here','f'为 首先检查here$中的'h'。没有比赛..

因此,它转到下一个交替表达式.*,它可以匹配'f'
并匹配字符串的结尾。

即使主题字符串末尾包含“... here”,也不会匹配 在这种情况下。

但是,如果您使用此正则表达式.*(here$)|.*,则第一个.*(here$)会匹配,因为 'f'可以一直匹配到最后的'here'。

从技术上讲,你想知道哪个案例匹配,同时又是 匹配所有其他文本。

如果是这样,有很多方法,这里有两个。

这使用分支重置。

 # ^(?|(Case[12]).*|.*(Case3)|(.+))$

 ^ 
 (?|
      ( Case [12] )                 # (1)
      .* 
   |  
      .* 
      ( Case3 )                     # (1)
   |  
      ( .+ )                        # (1)
 )
 $ 

这使用个别捕获组来具体告诉您哪种情况
匹配。

 # ^(?:(Case[12]).*|.*(Case3)|(.+))$

 ^ 
 (?:
      ( Case [12] )                 # (1)
      .* 
   |  
      .* 
      ( Case3 )                     # (2)
   |  
      ( .+ )                        # (3)
 )
 $