我正在使用.NET Framework 4.7.1和Visual Studio 2017开发C#库。
我必须拆分这三个字符串:
"<expr><op><expr>"
"(<expr><op><expr>)"
"<pre_op>(<expr>)"
到
string[] { "<expr>", "<op>", "<expr>" }
string[] { "(", "<expr>", "<op>", "<expr>", ")"}
string[] { "<pre_op>", "(", "<expr>", ")" }
以下句子可以正常使用,但它不会处理(
或)
:
string[] result1 = Regex
.Matches("<expr><op><expr>", @"<.*?>")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string[] result2 = Regex
.Matches("(<expr><op><expr>)", @"<.*?>")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
但是result1和result2是否相等?
我如何处理(
和)
以将它们作为子串?
答案 0 :(得分:0)
string[] result1 = Regex
.Matches("<expr><op><expr>", **@"(\(|<.*?>|\))"**)
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string[] result2 = Regex
.Matches("(<expr><op><expr>)", @"(\(|<.*?>|\))")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
string[] result3 = Regex
.Matches("<pre_op>(<expr>)", @"(\(|<.*?>|\))")
.Cast<Match>()
.Select(m => m.Value)
.ToArray();
答案 1 :(得分:0)
这可以说是.NET balancing groups的一个很好的用例,可以确保所有圆括号和尖括号都是平衡的,另请参阅What are regular expression Balancing Groups?
第一种模式就是这样做
(?<B>\()+[^()]+(?<-B>\))(?(B)(?!))|(?<A><)+[^<>]+(?<-A>>)+(?(A)(?!))
匹配<tags>
和(everything between brackets)
然后第二个正则表达式用于标记完美平衡的部分。
\(|<.*?>|\)
示例代码:
using System;
using System.Text.RegularExpressions;
public class Example
{
public static void Main()
{
string input = @"<expr><op><expr>
(<expr><op><expr>)
<pre_op>(<expr>)
(<expr>)<pre_op>(<expr>)";
Regex rxBalanced = new Regex(
@"(?<B>\()+[^()]+(?<-B>\))(?(B)(?!))|(?<A><)+[^<>]+(?<-A>>)+(?(A)(?!))",
RegexOptions.Multiline
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
Regex rxTokens = new Regex(
@"\(|<.*?>|\)",
RegexOptions.Multiline
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
foreach (Match match in rxBalanced.Matches(input))
{
foreach (Match token in rxTokens.Matches(match.Value))
{
Console.WriteLine(token.Value);
}
}
}
}
允许您同时检查两个平衡组的替代模式可能如下所示
(?<B>(\())*((?<A><)+[^<>]+(?<-A>>)+(?(A)(?!)))+(?<-B>(\)))*(?(B)(?!))
不幸的是,从生成的嵌套集合中获取所有需要的值要困难得多。但是,我发现这个问题很有趣,可以创建一个执行所有黑魔法的LinQ查询:
var regex = new Regex("(?<B>(\\())*((?<A><)+[^<>]+(?<-A>>)+(?(A)(?!)))+(?<-B>(\\)))*(?(B)(?!))",
RegexOptions.Multiline | RegexOptions.CultureInvariant | RegexOptions.Compiled);
var x = (from Match m in regex.Matches("(<x><y><z>)<expr>(<a><b><c>)<d>")
select new
{
result = m.Groups[1].Value.StartsWith("(") ?
(new List<string> { "(" }
.Concat(m.Groups[2].Captures.Count > 1 ?
(from Capture c in m.Groups[2].Captures select c.Value).ToList()
: new List<string> { m.Groups[2].Value }
)
.Concat(new List<string> { ")" })
)
: new List<string> { m.Value }
}).SelectMany(r => r.result);