我有这个正则表达式
("[^"]*")|('[^']*')|([^<>]+)
递交此输入字符串
<telerik:RadTab Text="RGB">
我希望它匹配RGB
。但是,由于最后一个替代方案导致字符串较长,因此不会。
我理想的是这个:
这个逻辑可以在一个正则表达式中完成吗?
答案 0 :(得分:3)
var strings = new[]
{"<telerik:RadTab Text=\"RGB\">", "<telerik:RadTab Text=RGB>", "<telerik:RadTab Text='RGB'>"};
var r = new Regex("<([^<\"']+[^>\"']+)>|(\"[^\"]*\")|('[^']*')");
foreach (var s1 in strings)
{
Console.WriteLine(s1);
var match = r.Match(s1);
Console.WriteLine(match.Value);
Console.WriteLine();
}
Console.ReadLine();
答案 1 :(得分:2)
此问题的解决方案之一是使用前瞻断言:
(?=("[^"]*"))|(?=('[^']*'))|(?=<([^<>]+)>)
让我们编写正则表达式以获得更好的视图:
(?= # zero-width assertion, look ahead if there is ...
("[^"]*") # a double quoted string, group it in group number 1
) # end of lookahead
| # or
(?= # zero-width assertion, look ahead if there is ...
('[^']*') # a single quoted string, group it in group number 2
) # end of lookahead
| # or
(?= # zero-width assertion, look ahead if there is ...
<([^<>]+)> # match anything except <> between <> one or more times and group it in group number 3
) # end of lookahead
你可能会想what in the world is he doing?
,没问题我会进一步解释你的正则表达式失败的原因。
我们有以下字符串<telerik:RadTab Text="RGB">
:
<telerik:RadTab Text="RGB">
^ the regex engine starts here
since there is no match with ("[^"]*")|('[^']*')|([^<>]+)
it will look further !
<telerik:RadTab Text="RGB">
^ the regex engine will now take a look here
it will check if there is "[^"]*", well obviously there isn't
now since there is an alternation, the regex engine will
check if there is '[^']*', meh same thing
it will now check if there is [^<>]+, but hey it matches !
So your regex engine will "eat" it like so
<telerik:RadTab Text="RGB">
^^^^^^^^^^^^^^^^^^^^^^^^^ and match this, by eating I mean it's advancing
Now the regex engine is at this point
<telerik:RadTab Text="RGB">
^ and obviously, there is no match
The problem is, you want it to "step" back to match "RGB"
The regex engine won't go back for you :(
这就是为什么我们对组使用零宽度断言,它不会吃(不会前进),如果你在前瞻中使用一个组,你仍然会得到你的匹配组。
<telerik:RadTab Text="RGB">
^ So when it comes here, it will match it with (?=<([^<>]+)>)
but it won't eat the whole matched string
Now obviously, the regex needs to continue to look for other matches
So it comes here:
<telerik:RadTab Text="RGB">
^ no match
<telerik:RadTab Text="RGB">
^ no match
.....
until
<telerik:RadTab Text="RGB">
^ hey there is a match using (?=("[^"]*"))
it will then advance further
<telerik:RadTab Text="RGB">
^ no match
.... until it reaches the end
当然,如果您有一个类似<telerik:RadTab Text="RGB'lol'">
的字符串,它仍然会在双引号值中匹配'lol'
并将其放在第2组中。
Online demo
正则表达式摇滚!!!
答案 2 :(得分:1)
编辑:考虑以下正则表达式......
(\".*?\"|\'.*?\'|(?<=\<).*?(?=\>))