如何仅分割最外层

时间:2018-08-24 10:17:15

标签: c# regex split

我有一个这样的声明字符串:

    *
    | { table_name | view_name | table_alias }.*
    | {
        [ { table_name | view_name | table_alias }. ]
        { column_name | $IDENTITY | $ROWGUID }
        | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
        | expression
        [ [ AS ] column_alias ]
      }
    | column_alias = expression 

我只需要最外面的项目,所以我使用char |来分割内容,我想排除括号中存在的所有|
拆分的结果是它有4个项目,如下所示:

#1 *
#2 { table_name | view_name | table_alias }.*
#3 { [ { table_name | view_name | table_alias }. ] { column_name | $IDENTITY | $ROWGUID } | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ] | expression [ [ AS ] column_alias ] }

#4 column_alias = expression

我尝试了(?m)\s*^\|\s*^(({\|\s*})({\{})?)({.+})$之类的东西,但只给了我一个而不是四个。 感谢@WiktorStribiżew和@Rui Jarimba的帮助。

我有一个主意(?<!\{[^\}]*)\|(?![^\{]*\}),我的想法是这样的:

#1 *
#2 { table_name | view_name | table_alias }.*
#3

 {
                [ { table_name | view_name | table_alias }. ]
                { column_name | $IDENTITY | $ROWGUID }

#4

udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
                    | expression
                    [ [ AS ] column_alias ]
                  }

#5 column_alias = expression

现在,我需要进行一些更改以修复(?<!\{[^\}]*)\|(?![^\{]*\})并清除#4...。

好吧,我发现了一个模式,也许它不是完美的,但确实可行。像这样:

Regex.Split(s, @"(?<!\{(?>[^\{\}]+|\{(?<D>)|\}(?<-D>))*(?(D)(?!)))\|(?!(?>[^\{\}]+|\{(?<D>)|\}(?<-D>))*(?(D)(?!))\})")

最后,我要感谢所有再次帮助我的人。

1 个答案:

答案 0 :(得分:0)

在这里:

using System.Text.RegularExpressions;

static void Main(string[] args)
{
    string text = @"*
    | { table_name | view_name | table_alias }.*
    | {
        [ { table_name | view_name | table_alias }. ]
        { column_name | $IDENTITY | $ROWGUID }
        | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
        | expression
        [ [ AS ] column_alias ]
    }
    | column_alias = expression";


    string pattern = BuildPattern();
    RegexOptions options = RegexOptions.Compiled | RegexOptions.Multiline;


    // solution 1: using a MatchEvaluator(Match) delegate
    string normalizedText = Regex.Replace(text, pattern, GetNormalizedLine, options);

    // solution 2: using replacement groups
    string normalizedText2 = Regex.Replace(text, pattern, "$3$4", options);

    bool areEqual = normalizedText2.Equals(normalizedText);

    Console.Read();
}

private static string BuildPattern()
{
    // '|' is special character, needs to be escaped. 
    // Assuming there might be some whitespace after the pipe
    string pipe = @"\|\s*";

    // '{' is special character, needs to be escaped. 
    string bracket = @"\{";

    // remaining text in the line
    string otherText = @".+";

    // using parenthesis () to group the results
    string pattern = $"^(({pipe})({bracket})?)({otherText})$";

    return pattern;
}

private static string GetNormalizedLine(Match match)
{
    GroupCollection groups = match.Groups;

    return $"{groups[3].Value}{groups[4].Value}";
}

输出为以下字符串:

*
{ table_name | view_name | table_alias }.*
{
    [ { table_name | view_name | table_alias }. ]
    { column_name | $IDENTITY | $ROWGUID }
    | udt_column_name [ { . | :: } { { property_name | field_name } | method_name ( argument [ ,...n] ) } ]
    | expression
    [ [ AS ] column_alias ]
  }
column_alias = expression

编辑

我没有使用OP所提到的Regex.Split(),因为我认为不必删除|字符。要获得包含所有行(不包括空格)的数组很简单:

string[] lines = normalizedText.Split(Environment.NewLine, StringSplitOptions.RemoveEmptyEntries);

一些注意事项:

  • 我假设要删除的|字符始终位于该行的开头,即该字符之前之前没有空格
  • 我假定字符|{
  • 之间可能存在
  • 我正在使用括号对匹配项进行分组(请参见Regular Expression Groups in C#