正则表达式可以获取包含特定字符的最内层括号; ' |'在这种情况下?
一些示例和一个(c#)测试方法:
string[] tests = {
"x () y", "",
"x (a) y", "",
"x (a.b()) y", "",
"x ((a).b() | (b).c()) y", "(a).b() | (b).c()",
"x (a|b) y", "a|b",
"x ((a|b) | c)", "a|b",
"x (a|b|c) y", "a|b|c",
"x (a|a.b()|c) y", "a|a.b()|c",
"x (a.b()|b.c()) y", "a.b()|b.c()",
"x (a.b()|b.c()|c) y", "a.b()|b.c()|c",
"x (a|b.c()|c.d()) y", "a|b.c()|c.d()",
"x (a|(b.c()|d)) y", "b.c()|d",
"x (a|a.b(a)|c) y", "a|a.b(a)|c"
};
for (int i = 0; i < tests.Length; i+=2)
{
var match = re.Match(tests[i]);
var result = match.Groups[1].Value;
Assert.That(result, Is.EqualTo(tests[i + 1]));
}
答案 0 :(得分:2)
解决所有测试的“非常简单”的正则表达式:
var re = new Regex(@"
(?:\()
(
(?>
(?:
(?<p>\() |
(?<-p>\)) |
[^()|]+ |
(?(p)(?!))(?<pipe>\|)
)*
)
)
(?:\))
(?(p)(?!))
(?(pipe)|(?!))", RegexOptions.IgnorePatternWhitespace);
string result = match.Groups[1].Value;
请注意RegexOptions.IgnorePatternWhitespace
的使用。正则表达式基于balancing groups。基于你不应该尝试使用正则表达式的事实,你没有完全理解,我将忽略它的工作原理的确切解释。我只会说检查(?(pipe)|(?!))
会检查捕获中是否至少捕获了|
,而(?(p)(?!))
表示“(?<p>\()
没有捕获的空括号1}}表达“。
我对这个正则表达式的看法是,在正则表达式中是徒劳的和危险的运动! (如果不清楚我是Some people, when confronted with a problem, think "I know, I'll use regular expressions" Now they have two problems.学派的话。你不应该使用它。这是一个不可思议的代码恐怖。
其他的事情:这个正则表达式大量回溯...添加了(?>
... )
来禁用回溯。
回溯的附加测试(第一个具有不平衡的括号):
"((((amusemen).emoadj().cap()(, (are we |arent we|I gather)|)?)", "are we |arent we|I gather",
"((amusemen).emoadj().cap()(, (are we |arent we|I gather)|)?)", "are we |arent we|I gather",
答案 1 :(得分:0)
以下方法可能过于复杂,并且有优化空间,但您可以将其作为起点,以防您无法找到更好的替代方案。我添加了一些基本的括号平衡检查。
该方法通过了所有用例,但有几个有趣的用例你没有考虑过;这个方法解决了它们如下:
a|b
(在括号外找到的字符):忽略,因此返回一个空字符串(a|b) c|d
:返回a|b
(与上述相同)(a|b) (c|d)
(多个匹配集合):返回c|d
(找到最后一个)string FindInnermostSet(string source, char toSearch = '|') { var candidateOpeningParenthesisPosition = -1; var candidateClosingParenthesisPosition = -1; var candidateOpeningParenthesNestingLevel = -1; var openingParenthesisPositions = new Stack<int>(); for(int i=0; i<source.Length; i++) { var currentChar = source[i]; if(currentChar == '(') { openingParenthesisPositions.Push(i); } else if (currentChar == ')') { if(!openingParenthesisPositions.Any()) throw new Exception("Syntax error: too many ')'"); if(openingParenthesisPositions.Count() == candidateOpeningParenthesNestingLevel) candidateClosingParenthesisPosition = i; openingParenthesisPositions.Pop(); } else if(currentChar == toSearch && openingParenthesisPositions.Any() && openingParenthesisPositions.Count() >= candidateOpeningParenthesNestingLevel) { candidateOpeningParenthesNestingLevel = openingParenthesisPositions.Count(); candidateOpeningParenthesisPosition = openingParenthesisPositions.Peek(); } } if(openingParenthesisPositions.Any()) throw new Exception("Syntax error: missing ')'"); if(candidateOpeningParenthesisPosition == -1) return ""; return source.Substring( candidateOpeningParenthesisPosition+1, candidateClosingParenthesisPosition-candidateOpeningParenthesisPosition-1); }