Question

我在C＃中有这个字符串

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO

我想使用RegEx来解析它以获得以下内容：

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

除了上面的例子，我测试了以下内容，但仍然无法正确解析它。

"%exc.uns: 8 hours let  @ = ABC, DEF", "exc_it = 1 day"  , " summ=graffe ", " a,b,(c,d)"

新文字将在一个字符串中

string mystr = @"""%exc.uns: 8 hours let  @ = ABC, DEF"", ""exc_it = 1 day""  , "" summ=graffe "", "" a,b,(c,d)""";

Answer 1

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
var resultStrings = new List<string>();
int? firstIndex = null;
int scopeLevel = 0;
for (int i = 0; i < str.Length; i++)
{
    if (str[i] == ',' && scopeLevel == 0)
    {
        resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault(), i - firstIndex.GetValueOrDefault()));
        firstIndex = i + 1;
    }
    else if (str[i] == '(') scopeLevel++;
    else if (str[i] == ')') scopeLevel--;
}
resultStrings.Add(str.Substring(firstIndex.GetValueOrDefault()));

Answer 2

事件更快：

([^,]*\x28[^\x29]*\x29|[^,]+)

这应该可以解决问题。基本上，寻找“功能指纹”或任何没有逗号的东西。

adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO
                  ^                   ^  ^      ^                  ^

Carets象征着分组停止的位置。

Answer 3

就是这个正则表达式：

[^,()]+(\([^()]*\))?

测试示例：

var s= "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
Regex regex = new Regex(@"[^,()]+(\([^()]*\))?");
var matches = regex.Matches(s)
    .Cast<Match>()
    .Select(m => m.Value);

返回

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
 NG/CL
 5 value of CL(JK)
 HO

Answer 4

如果您只是必须使用Regex，那么您可以在以下内容中拆分字符串：

,                # match a comma
(?=              # that is followed by
  (?:            # either
    [^\(\)]*     #  no parens at all
    |            # or
    (?:          #  
      [^\(\)]*   #  ...
      \(         #  (
      [^\(\)]*   #     stuff in parens
      \)         #  )
      [^\(\)]*   #  ...
    )+           #  any number of times
  )$             # until the end of the string
)

它将您的输入分解为以下内容：

adj_con(CL2,1,3,0)
adj_cont(CL1,1,3,0)
NG
NG/CL
5 value of CL(JK)
HO

您还可以使用.NET的平衡分组构造来创建一个与嵌套的parens一起使用的版本，但您可能还可以使用其中一个非Regex解决方案。

Answer 5

实现Snowbear正在做的事情的另一种方式：

    public static string[] SplitNest(this string s, char src, string nest, string trg)
    {
        int scope = 0;
        if (trg == null || nest == null) return null;
        if (trg.Length == 0 || nest.Length < 2) return null;
        if (trg.IndexOf(src) >= 0) return null;
        if (nest.IndexOf(src) >= 0) return null;

        for (int i = 0; i < s.Length; i++)
        {
            if (s[i] == src && scope == 0)
            {
                s = s.Remove(i, 1).Insert(i, trg);
            }
            else if (s[i] == nest[0]) scope++;
            else if (s[i] == nest[1]) scope--;
        }

        return s.Split(trg);
    }

我们的想法是将任何非嵌套分隔符替换为另一个分隔符，然后可以将其与普通string.Split()一起使用。您还可以选择要使用的括号类型 - ()，<>，[]，甚至是\/，][或{{1 }}。出于您的目的，您将使用

`'

该函数首先将您的字符串转换为

string str = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";
string[] result = str.SplitNest(',',"()","~");

然后在adj_con(CL2,1,3,0)~adj_cont(CL1,1,3,0)~NG~ NG/CL~ 5 value of CL(JK)~ HO上拆分，忽略嵌套的逗号。

Answer 6

假设非嵌套的匹配括号，您可以轻松匹配所需的标记，而不是分割字符串：

MatchCollection matches = Regex.Matches(data, @"(?:[^(),]|\([^)]*\))+");

Answer 7

var s = "adj_con(CL2,1,3,0),adj_cont(CL1,1,3,0),NG, NG/CL, 5 value of CL(JK), HO";  
var result = string.Join(@"\n",Regex.Split(s, @"(?<=\)),|,\s"));

模式匹配）并将其从匹配中排除然后匹配，要么匹配，然后是空格。

result =

adj_con（CL2,1,3,0）
adj_cont（CL1,1,3,0）
NG
NG / CL
5值CL（JK）
何

Answer 8

TextFieldParser（msdn）类似乎具有内置功能：

TextFieldParser类： - 提供用于解析结构化文本文件的方法和属性。

使用TextFieldParser解析文本文件类似于迭代文本文件，而提取文本字段的ReadFields方法类似于拆分字符串。

TextFieldParser可以解析两种类型的文件：分隔文件或固定宽度。某些属性（如Delimiters和HasFieldsEnclosedInQuotes）仅在使用分隔文件时才有意义，而FieldWidths属性仅在使用固定宽度文件时才有意义。

请参阅帮助我找到<{p>>的article

Answer 9

这是一个更强大的选项，它解析整个文本，包括嵌套括号：

string pattern = @"
\A
(?>
    (?<Token>
        (?:
            [^,()]              # Regular character
            |
            (?<Paren> \( )      # Opening paren - push to stack
            |
            (?<-Paren> \) )     # Closing paren - pop
            |
            (?(Paren),)         # If inside parentheses, match comma.
        )*?
    )
    (?(Paren)(?!))    # If we are not inside parentheses,
    (?:,|\Z)          # match a comma or the end
)*? # lazy just to avoid an extra empty match at the end,
    #  though it removes a last empty token.
\Z
";
Match match = Regex.Match(data, pattern, RegexOptions.IgnorePatternWhitespace);

您可以通过迭代match.Groups["Token"].Captures获得所有匹配。

如何在字段中存在逗号和括号时解析逗号分隔的字符串

9 个答案: