如何从字符串中获取段?

时间:2017-02-23 08:21:13

标签: c# regex xml

我有一个包含我所有细分的字符串。它看起来像这样:

var myString = "<seg_0 status=0>This is segment zero</seg_0><seg_1 status=1>This is segment one</seg_1><seg_2 status=0>This is segment two</seg_2>"

我希望将我的字符串中的所有段都放到这样的ArrayList中:

{
 {"index":"0","status":"0","seg":"This is segment zero"},
 {"index":"1","status":"1","seg":"This is segment one"},
 {"index":"2","status":"0","seg":"This is segment two"}
}

如何使用正则表达式

对此进行归档

2 个答案:

答案 0 :(得分:3)

此正则表达式提取3组:

 <seg_(\d+)\sstatus=(\d+)>(.*?)<\/seg_\1>
 Full match 0-44    `<seg_0 status=0>This is segment zero</seg_0>`
 Group 1.   5-6 `0` -> index
 Group 2.   14-15   `0` -> status
 Group 3.   16-36   `This is segment zero` ->segment text

提取字符串使用/<seg_(\d+)\sstatus=(\d+)>(.*?)<\/seg_\1>/g

中所有匹配的项目

答案 1 :(得分:1)

您可以尝试使用以下 regex 来捕获所有细分,并通过将其替换为捕获的组来创建数组:

input >>  <seg_0 status=0>This is segment zero</seg_0>
          <seg_1 status=1>This is segment one</seg_1>
          <seg_2 status=0>This is segment two</seg_2> 
regex >>  <seg_(\d+)[\s\w]+=(\d+)>([\w\s]+)<\/seg_\d+> 
replace with >>  {"index":"$1","status":"$2","seg":"$3"},
output >>  {"index":"0","status":"0","seg":"This is segment zero"},
           {"index":"1","status":"1","seg":"This is segment one"},
           {"index":"2","status":"0","seg":"This is segment two"},

请参阅demo / explanation

C# (可能)

using System;
using System.Text.RegularExpressions;

public class RegEx
{
    public static void Main()
    {
        string pattern = @"<seg_(\d+)[\s\w]+=(\d+)>([\w\s]+)<\/seg_\d+>";
        string substitution = @"{""index"":""$1"",""status"":""$2"",""seg"":""$3""},";
        string input = @"<seg_0 status=0>This is segment zero</seg_0><seg_1 status=1>This is segment one</seg_1><seg_2 status=0>This is segment two</seg_2>";

        Regex regex = new Regex(pattern);
        string result = regex.Replace(input, substitution);
    }
}