复杂的字符串拆分

时间:2015-06-04 00:55:24

标签: c# regex string parsing split

我有一个如下字符串:

[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)

您可以将其视为此树:

- [Testing.User]
- Info
        - [Testing.Info]
        - Name
                - [System.String]
                - Matt
        - Age
                - [System.Int32]
                - 21
- Description
        - [System.String]
        - This is some description

如您所见,它是类Testing.User

的字符串序列化/表示形式

我希望能够进行拆分并在结果数组中获取以下元素:

 [0] = [Testing.User]
 [1] = Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))
 [2] = Description:([System.String]|This is some description)

我无法按|拆分,因为这会导致:

 [0] = [Testing.User]
 [1] = Info:([Testing.Info]
 [2] = Name:([System.String]
 [3] = Matt)
 [4] = Age:([System.Int32]
 [5] = 21))
 [6] = Description:([System.String]
 [7] = This is some description)

如何获得预期结果?

我对正则表达式不太满意,但我知道这是一个非常可能的解决方案。

6 个答案:

答案 0 :(得分:7)

使用正则表达式预测

你可以使用这样的正则表达式:

(\[.*?])|(\w+:.*?)\|(?=Description:)|(Description:.*)

<强> Working demo

这个正则表达式背后的想法是分组捕捉123组。

您可以使用此图轻松查看:

Regular expression visualization

匹配信息

MATCH 1
1.  [0-14]   `[Testing.User]`
MATCH 2
2.  [15-88]  `Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))`
MATCH 3
3.  [89-143] `Description:([System.String]|This is some description)`

常规正则表达式

另一方面,如果你不喜欢上面的正则表达式,你可以使用另一个这样的:

(\[.*?])\|(.*)\|(Description:.*)

Regular expression visualization

<强> Working demo

甚至至少强迫一个角色:

(\[.+?])\|(.+)\|(Description:.+)

Regular expression visualization

答案 1 :(得分:6)

已经有足够多的分裂答案,所以这是另一种方法。如果您的输入表示树结构,为什么不将其解析为树? 以下代码是从VB.NET自动翻译的,但它应该在我测试它的时候起作用。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace Treeparse
{
    class Program
    {
        static void Main(string[] args)
        {
            var input = "[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)";
            var t = StringTree.Parse(input);
            Console.WriteLine(t.ToString());
            Console.ReadKey();
        }
    }

    public class StringTree
    {
        //Branching constants
        const string BranchOff = "(";
        const string BranchBack = ")";
        const string NextTwig = "|";

        //Content of this twig
        public string Text;
        //List of Sub-Twigs
        public List<StringTree> Twigs;
        [System.Diagnostics.DebuggerStepThrough()]
        public StringTree()
        {
            Text = "";
            Twigs = new List<StringTree>();
        }

        private static void ParseRecursive(StringTree Tree, string InputStr, ref int Position)
        {
            do {
                StringTree NewTwig = new StringTree();
                do {
                    NewTwig.Text = NewTwig.Text + InputStr[Position];
                    Position += 1;
                } while (!(Position == InputStr.Length || (new String[] { BranchBack, BranchOff, NextTwig }.ToList().Contains(InputStr[Position].ToString()))));
                Tree.Twigs.Add(NewTwig);
                if (Position < InputStr.Length && InputStr[Position].ToString() == BranchOff) { Position += 1; ParseRecursive(NewTwig, InputStr, ref Position); Position += 1; }
                if (Position < InputStr.Length && InputStr[Position].ToString() == BranchBack)
                    break; // TODO: might not be correct. Was : Exit Do
                Position += 1;
            } while (!(Position >= InputStr.Length || InputStr[Position].ToString() == BranchBack));
        }

        /// <summary>
        /// Call this to parse the input into a StringTree objects using recursion
        /// </summary>
        public static StringTree Parse(string Input)
        {
            StringTree t = new StringTree();
            t.Text = "Root";
            int Start = 0;
            ParseRecursive(t, Input, ref Start);
            return t;
        }

        private void ToStringRecursive(ref StringBuilder sb, StringTree tree, int Level)
        {
            for (int i = 1; i <= Level; i++)
            {
                sb.Append("   ");
            }
            sb.AppendLine(tree.Text);
            int NextLevel = Level + 1;
            foreach (StringTree NextTree in tree.Twigs)
            {
                ToStringRecursive(ref sb, NextTree, NextLevel);
            }
        }

        public override string ToString()
        {
            var sb = new System.Text.StringBuilder();
            ToStringRecursive(ref sb, this, 0);
            return sb.ToString();
        }

    }
}

结果(点击):

您可以使用树状结构获取每个节点及其关联子值的值,然后您可以随意使用它,例如在TreeView控件中轻松显示结构:

enter image description here

答案 2 :(得分:3)

假设您的群组可以标记为

  1. [Anything.Anything]
  2. 任何东西:ReallyAnything(仅限字母和数字:然后是任意数量的字符)在第一个管道之后
  3. 任何东西:ReallyAnything(仅限字母和数字:然后是最后一个管道的任何字符)
  4. 然后你有一个类似的模式:

    "(\\[\\w+\\.\\w+\\])\\|(\\w+:.+)\\|(\\w+:.+)";
    
    • (\\[\\w+\\.\\w+\\])此捕获组将获得“[Testing.User]”但不仅限于“[Testing.User]”
    • \\|(\\w+:.+)此捕获组将在第一个管道之后获取数据并在最后一个管道之前停止。在这种情况下,“Info:([Testing.Info] | Name:([System.String] | Matt)| Age:([System.Int32] | 21))”但不限于以“Info: “
    • \\|(\\w+:.+)与先前相同的捕获组,但捕获最后一个管道之后的任何内容,在本例中为“描述:([System.String] |这是一些描述)”但不限于以描述开头:“

    现在,如果您要添加另一个管道,然后添加更多数据(|Anything:SomeData),那么Description:将成为第2组的一部分,而第3组现在将是“Anything:SomeData”。< / p>

    代码如下:

    using System;
    using System.Text.RegularExpressions;
    
    public class Program
    {
        public static void Main()
        {
            String text = "[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)";
            String pattern = "(\\[\\w+\\.\\w+\\])\\|(\\w+:.+)\\|(\\w+:.+)";
    
            Match match = Regex.Match(text, pattern);
            if (match.Success)
            {
                Console.WriteLine(match.Groups[1]);
                Console.WriteLine(match.Groups[2]);
                Console.WriteLine(match.Groups[3]); 
            }
        }
    }
    

    结果:

    [Testing.User]
    Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))
    Description:([System.String]|This is some description)
    

    请在此处查看工作示例... https://dotnetfiddle.net/DYcZuY

    如果我按照模式格式添加其他字段,请参阅工作示例... https://dotnetfiddle.net/Mtc1CD

答案 3 :(得分:3)

要做到这一点,您需要使用balancing groups这是一个独占.net正则表达式引擎的正则表达式功能。它是一个计数器系统,当找到一个左括号时计数器递增,当找到一个结束时计数器递减,那么你只需要测试计数器是否为空以知道括号是否平衡。 这是确保您在括号内或外的唯一方法:

using System;
using System.Text.RegularExpressions;

public class Example
{
   public static void Main()
   {
       string input = @"[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)";

       string pattern = @"(?:[^|()]+|\((?>[^()]+|(?<Open>[(])|(?<-Open>[)]))*(?(Open)(?!))\))+";

       foreach (Match m in Regex.Matches(input, pattern)) 
           Console.WriteLine(m.Value);
   }
}

demo

模式细节:

(?:
    [^|()]+    # all that is not a parenthesis or a pipe
  |            # OR
               # content between parenthesis (eventually nested)
    \(              # opening parenthesis
     # here is the way to obtain balanced parens
    (?> # content between parens
        [^()]+        # all that is not parenthesis 
      |               # OR
        (?<Open>[(])  # an opening parenthesis (increment the counter)
      |
        (?<-Open>[)]) # a closing parenthesis (decrement the counter)
    )*  # repeat as needed
    (?(Open)(?!)) # make the pattern fail if the counter is not zero

    \)
)+

(?(open) (?!) )是条件语句。

(?!)是一个永远错误的子模式(一个空的否定前瞻),这意味着:后面没有任何内容

此模式匹配所有不是管道和括号之间的字符串。

答案 4 :(得分:2)

正则表达式不是解决此类问题的最佳方法,您可能需要编写一些代码来解析数据,我做了一个简单的例子来实现这个简单的案例。这里的基本思想是,只有当|不在括号内时才要拆分,所以我会跟踪括号计数。例如,您需要对括号是描述部分的一部分的威胁情况进行一些处理,但正如我所说,这只是一个起点:

static IEnumerable<String> splitSpecial(string input)
{
    StringBuilder builder = new StringBuilder();
    int openParenthesisCount = 0;

    foreach (char c in input)
    {
        if (openParenthesisCount == 0 && c == '|')
        {
            yield return builder.ToString();
            builder.Clear();
        }
        else
        {
            if (c == '(')
                openParenthesisCount++;
            if (c == ')')
                openParenthesisCount--;
            builder.Append(c);
        }
    }
    yield return builder.ToString();
}

static void Main(string[] args)
{
    string input = "[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)";
    foreach (String split in splitSpecial(input))
    {
        Console.WriteLine(split);
    }
    Console.ReadLine();
}

输出:

[Testing.User]
Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))
Description:([System.String]|This is some description)

答案 5 :(得分:1)

这不是一个出色/强大的解决方案,但如果您知道您的三个顶级项目已修复,那么您可以将其硬编码到正则表达式中。

(\[Testing\.User\])\|(Info:.*)\|(Description:.*)

这个正则表达式将按照您的预期在其中创建一个包含三个组的匹配项。你可以在这里测试一下: http://derekslager.com/blog/posts/2007/09/a-better-dotnet-regular-expression-tester.ashx

编辑:这是一个完整的C#示例

using System;
using System.Text.RegularExpressions;

namespace ConsoleApplication3
{
    internal class Program
    {
        private static void Main(string[] args)
        {
            const string input = @"[Testing.User]|Info:([Testing.Info]|Name:([System.String]|Matt)|Age:([System.Int32]|21))|Description:([System.String]|This is some description)";
            const string pattern = @"(\[Testing\.User\])\|(Info:.*)\|(Description:.*)";

            var match = Regex.Match(input, pattern);
            if (match.Success)
            {
                for (int i = 1; i < match.Groups.Count; i++)
                {
                    Console.WriteLine("[" + i + "] = " + match.Groups[i]);
                }
            }

            Console.ReadLine();
        }
    }
}