喜欢正则表达式中的一些替代方案

时间:2013-11-04 20:12:51

标签: .net regex

我有这个正则表达式

("[^"]*")|('[^']*')|([^<>]+)

递交此输入字符串

<telerik:RadTab Text="RGB">

我希望它匹配RGB。但是,由于最后一个替代方案导致字符串较长,因此不会。

我理想的是这个:

  1. 如果有双引号子字符串,则匹配它,包括双引号。
  2. 否则,如果存在单引号子字符串,则匹配它,包括单引号。
  3. 否则,如果有一个由尖括号包围的字符串,则匹配它,不包括尖括号。
  4. 这个逻辑可以在一个正则表达式中完成吗?

3 个答案:

答案 0 :(得分:3)

    var strings = new[]
        {"<telerik:RadTab Text=\"RGB\">", "<telerik:RadTab Text=RGB>", "<telerik:RadTab Text='RGB'>"};
    var r = new Regex("<([^<\"']+[^>\"']+)>|(\"[^\"]*\")|('[^']*')");
    foreach (var s1 in strings)
    {
        Console.WriteLine(s1);
        var match = r.Match(s1);
        Console.WriteLine(match.Value);
        Console.WriteLine();
    }
    Console.ReadLine();

答案 1 :(得分:2)

此问题的解决方案之一是使用前瞻断言:

(?=("[^"]*"))|(?=('[^']*'))|(?=<([^<>]+)>)

让我们编写正则表达式以获得更好的视图:

(?=             # zero-width assertion, look ahead if there is ...
    ("[^"]*")   # a double quoted string, group it in group number 1
)               # end of lookahead
|               # or
(?=             # zero-width assertion, look ahead if there is ...
('[^']*')       # a single quoted string, group it in group number 2
)               # end of lookahead
|               # or
(?=             # zero-width assertion, look ahead if there is ...
<([^<>]+)>      # match anything except <> between <> one or more times and group it in group number 3
)               # end of lookahead

你可能会想what in the world is he doing?,没问题我会进一步解释你的正则表达式失败的原因。

我们有以下字符串<telerik:RadTab Text="RGB">

<telerik:RadTab Text="RGB">
^ the regex engine starts here
since there is no match with ("[^"]*")|('[^']*')|([^<>]+)
it will look further !

<telerik:RadTab Text="RGB">
 ^ the regex engine will now take a look here
it will check if there is "[^"]*", well obviously there isn't
now since there is an alternation, the regex engine will
check if there is '[^']*', meh same thing
it will now check if there is [^<>]+, but hey it matches !

So your regex engine will "eat" it like so
<telerik:RadTab Text="RGB">
 ^^^^^^^^^^^^^^^^^^^^^^^^^ and match this, by eating I mean it's advancing
Now the regex engine is at this point
<telerik:RadTab Text="RGB">
                          ^ and obviously, there is no match
The problem is, you want it to "step" back to match "RGB"
The regex engine won't go back for you :(

这就是为什么我们对组使用零宽度断言,它不会吃(不会前进),如果你在前瞻中使用一个组,你仍然会得到你的匹配组。

<telerik:RadTab Text="RGB">
^ So when it comes here, it will match it with (?=<([^<>]+)>)
but it won't eat the whole matched string
Now obviously, the regex needs to continue to look for other matches
So it comes here:
<telerik:RadTab Text="RGB">
 ^ no match
<telerik:RadTab Text="RGB">
  ^ no match
.....
until
<telerik:RadTab Text="RGB">
                     ^ hey there is a match using (?=("[^"]*"))
it will then advance further
<telerik:RadTab Text="RGB">
                      ^ no match
.... until it reaches the end

当然,如果您有一个类似<telerik:RadTab Text="RGB'lol'">的字符串,它仍然会在双引号值中匹配'lol'并将其放在第2组中。

Online demo
正则表达式摇滚!!!

答案 2 :(得分:1)

编辑:考虑以下正则表达式......

(\".*?\"|\'.*?\'|(?<=\<).*?(?=\>))