使用正则表达式

时间:2018-11-01 10:47:29

标签: c# .net regex substring capture

正则表达式模式看起来如何捕获2个定界符之间的子字符串,但不包括第一个定界符之后和最后一个定界符之前(如果有)的某些字符(如果有)? 输入字符串看起来像这样:

var input = @"Not relevant {

#AddInfoStart Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info"" ;

# } also not relevant";

捕获应包含“ {”和“}”之间的子字符串,但排除在开始定界符“ {”之后的任何空格,换行符和“ #AddInfoStart”字符串(只要存在任何空格),还应排除任何空格,换行符和“;”和末尾定界符“}”前的“#”字符(如果存在)。

捕获的字符串应如下所示

Comment:String:=""This is a comment"";

AdditionalInfo:String:=""This is some additional info""

内部分隔符“:”和“:=”之前或之后可能存在空格,并且“:=”之后的值可能并不总是标记为字符串,例如:

{  Val1 : Real := 1.7  }

对于数组,使用以下语法:

arr1 : ARRAY [1..5] OF INT := [2,5,44,555,11];
arr2 : ARRAY [1..3] OF REAL

1 个答案:

答案 0 :(得分:2)

这是我的解决方案:

  1. 删除括号内的内容
  2. 使用正则表达式获取括号内的值

代码:

var input = @"Not relevant {

#AddInfoStart Comment:String:=""This is a comment"";

            Val1 : Real := 1.7

AdditionalInfo:String:=""This is some additional info"" ;

# } also not relevant";

// remove content outside brackets
input = Regex.Replace(input, @".*\{", string.Empty);
input = Regex.Replace(input, @"\}.*", string.Empty);

string property = @"(\w+)"; 
string separator = @"\s*:\s*"; // ":" with or without whitespace
string type = @"(\w+)"; 
string equals = @"\s*:=\s*"; // ":=" with or without whitespace
string text = @"""?(.*?)"""; // value between ""
string number = @"(\d+(\.\d+)?)"; // number like 123 or with a . separator such as 1.45
string value = $"({text}|{number})"; // value can be a string or number
string pattern = $"{property}{separator}{type}{equals}{value}";

var result = Regex.Matches(input, pattern)
                  .Cast<Match>()
                  .Select(match => new
                  {
                      FullMatch = match.Groups[0].Value, // full match is always the 1st group
                      Property = match.Groups[1].Value, 
                      Type = match.Groups[2].Value, 
                      Value = match.Groups[3].Value 
                  })
                  .ToList();