正则表达式匹配包含特定字符的最内层括号

时间:2018-05-16 10:13:12

标签: c# .net regex

正则表达式可以获取包含特定字符的最内层括号; ' |'在这种情况下?

一些示例和一个(c#)测试方法:

string[] tests = {
    "x () y", "",
    "x (a) y", "",
    "x (a.b()) y", "",
    "x ((a).b() | (b).c()) y", "(a).b() | (b).c()",
    "x (a|b) y", "a|b",
    "x ((a|b) | c)", "a|b",
    "x (a|b|c) y", "a|b|c",
    "x (a|a.b()|c) y", "a|a.b()|c",
    "x (a.b()|b.c()) y", "a.b()|b.c()",
    "x (a.b()|b.c()|c) y", "a.b()|b.c()|c",
    "x (a|b.c()|c.d()) y", "a|b.c()|c.d()",
    "x (a|(b.c()|d)) y", "b.c()|d",
    "x (a|a.b(a)|c) y", "a|a.b(a)|c"
};

for (int i = 0; i < tests.Length; i+=2)
{
    var match = re.Match(tests[i]);
    var result = match.Groups[1].Value;
    Assert.That(result, Is.EqualTo(tests[i + 1]));
}

2 个答案:

答案 0 :(得分:2)

解决所有测试的“非常简单”的正则表达式:

var re = new Regex(@"
(?:\()
(
    (?>
        (?:  
            (?<p>\()  |  
            (?<-p>\))  |  
            [^()|]+  |  
            (?(p)(?!))(?<pipe>\|)  
        )*  
    )    
)
(?:\))
(?(p)(?!))
(?(pipe)|(?!))", RegexOptions.IgnorePatternWhitespace);

string result = match.Groups[1].Value;

请注意RegexOptions.IgnorePatternWhitespace的使用。正则表达式基于balancing groups。基于你不应该尝试使用正则表达式的事实,你没有完全理解,我将忽略它的工作原理的确切解释。我只会说检查(?(pipe)|(?!))会检查捕获中是否至少捕获了|,而(?(p)(?!))表示“(?<p>\()没有捕获的空括号1}}表达“。

我对这个正则表达式的看法是,在正则表达式中是徒劳的和危险的运动! (如果不清楚我是Some people, when confronted with a problem, think "I know, I'll use regular expressions" Now they have two problems.学派的话。你不应该使用它。这是一个不可思议的代码恐怖。

其他的事情:这个正则表达式大量回溯...添加了(?> ... )来禁用回溯。

回溯的附加测试(第一个具有不平衡的括号):

"((((amusemen).emoadj().cap()(, (are we |arent we|I gather)|)?)", "are we |arent we|I gather",
"((amusemen).emoadj().cap()(, (are we |arent we|I gather)|)?)", "are we |arent we|I gather",

答案 1 :(得分:0)

以下方法可能过于复杂,并且有优化空间,但您可以将其作为起点,以防您无法找到更好的替代方案。我添加了一些基本的括号平衡检查。

该方法通过了所有用例,但有几个有趣的用例你没有考虑过;这个方法解决了它们如下:

  • a|b(在括号外找到的字符):忽略,因此返回一个空字符串
  • (a|b) c|d:返回a|b(与上述相同)
  • (a|b) (c|d)(多个匹配集合):返回c|d(找到最后一个)
string FindInnermostSet(string source, char toSearch = '|')
{
    var candidateOpeningParenthesisPosition = -1;
    var candidateClosingParenthesisPosition = -1;
    var candidateOpeningParenthesNestingLevel = -1;
    var openingParenthesisPositions = new Stack<int>();

    for(int i=0; i<source.Length; i++)
    {
        var currentChar = source[i];
        if(currentChar == '(') {
            openingParenthesisPositions.Push(i);
        }
        else if (currentChar == ')')
        {
            if(!openingParenthesisPositions.Any())
                throw new Exception("Syntax error: too many ')'");
            if(openingParenthesisPositions.Count() == candidateOpeningParenthesNestingLevel)
                candidateClosingParenthesisPosition = i;
            openingParenthesisPositions.Pop();              
        }
        else if(currentChar == toSearch && 
                openingParenthesisPositions.Any() &&
                openingParenthesisPositions.Count() >= candidateOpeningParenthesNestingLevel)
        {
            candidateOpeningParenthesNestingLevel = openingParenthesisPositions.Count();
            candidateOpeningParenthesisPosition = openingParenthesisPositions.Peek();
        }
    }

    if(openingParenthesisPositions.Any())
        throw new Exception("Syntax error: missing ')'");

    if(candidateOpeningParenthesisPosition == -1)
        return "";

    return source.Substring(
        candidateOpeningParenthesisPosition+1,
        candidateClosingParenthesisPosition-candidateOpeningParenthesisPosition-1);
}