Regex.Matches获取<>之间的子串:我需要在结果中添加(和)作为子字符串

时间:2018-03-27 16:07:25

标签: c# regex

我正在使用.NET Framework 4.7.1和Visual Studio 2017开发C#库。

我必须拆分这三个字符串:

"<expr><op><expr>"
"(<expr><op><expr>)"
"<pre_op>(<expr>)"

string[] { "<expr>", "<op>", "<expr>" }
string[] { "(", "<expr>", "<op>", "<expr>", ")"}
string[] { "<pre_op>", "(", "<expr>", ")" }

以下句子可以正常使用,但它不会处理()

string[] result1 = Regex
    .Matches("<expr><op><expr>", @"<.*?>")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();
string[] result2 = Regex
    .Matches("(<expr><op><expr>)", @"<.*?>")
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();

但是result1和result2是否相等?

我如何处理()以将它们作为子串?

2 个答案:

答案 0 :(得分:0)

string[] result1 = Regex
    .Matches("<expr><op><expr>", **@"(\(|<.*?>|\))"**)
    .Cast<Match>()
    .Select(m => m.Value)
    .ToArray();
            string[] result2 = Regex
                .Matches("(<expr><op><expr>)", @"(\(|<.*?>|\))")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToArray();
            string[] result3 = Regex
                .Matches("<pre_op>(<expr>)", @"(\(|<.*?>|\))")
                .Cast<Match>()
                .Select(m => m.Value)
                .ToArray();

答案 1 :(得分:0)

这可以说是.NET balancing groups的一个很好的用例,可以确保所有圆括号和尖括号都是平衡的,另请参阅What are regular expression Balancing Groups?

第一种模式就是这样做

(?<B>\()+[^()]+(?<-B>\))(?(B)(?!))|(?<A><)+[^<>]+(?<-A>>)+(?(A)(?!))

匹配<tags>(everything between brackets) 然后第二个正则表达式用于标记完美平衡的部分。

\(|<.*?>|\)

示例代码:

using System;
using System.Text.RegularExpressions;
public class Example
{
    public static void Main()
    {
        string input = @"<expr><op><expr>
(<expr><op><expr>)
<pre_op>(<expr>)
(<expr>)<pre_op>(<expr>)";

         Regex rxBalanced = new Regex(
            @"(?<B>\()+[^()]+(?<-B>\))(?(B)(?!))|(?<A><)+[^<>]+(?<-A>>)+(?(A)(?!))",
            RegexOptions.Multiline
            | RegexOptions.CultureInvariant
            | RegexOptions.IgnorePatternWhitespace
            | RegexOptions.Compiled
        );
        Regex rxTokens = new Regex(
            @"\(|<.*?>|\)",
            RegexOptions.Multiline
            | RegexOptions.CultureInvariant
            | RegexOptions.IgnorePatternWhitespace
            | RegexOptions.Compiled
        );

        foreach (Match match in rxBalanced.Matches(input))
        {
            foreach (Match token in rxTokens.Matches(match.Value))
            {
                Console.WriteLine(token.Value);
            }
        }
    }
}

允许您同时检查两个平衡组的替代模式可能如下所示

(?<B>(\())*((?<A><)+[^<>]+(?<-A>>)+(?(A)(?!)))+(?<-B>(\)))*(?(B)(?!))

不幸的是,从生成的嵌套集合中获取所有需要的值要困难得多。但是,我发现这个问题很有趣,可以创建一个执行所有黑魔法的LinQ查询:

var regex = new Regex("(?<B>(\\())*((?<A><)+[^<>]+(?<-A>>)+(?(A)(?!)))+(?<-B>(\\)))*(?(B)(?!))",
    RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);

var x = (from Match m in regex.Matches("(<x><y><z>)<expr>(<a><b><c>)<d>")
        select new
        {
            result = m.Groups[1].Value.StartsWith("(") ?
                        (new List<string> { "(" }
                            .Concat(m.Groups[2].Captures.Count > 1 ?
                                (from Capture c in m.Groups[2].Captures select c.Value).ToList()
                                : new List<string> { m.Groups[2].Value }
                            )
                            .Concat(new List<string> { ")" })
                        )
                        : new List<string> { m.Value }
        }).SelectMany(r => r.result);