我有一个包含我所有细分的字符串。它看起来像这样:
var myString = "<seg_0 status=0>This is segment zero</seg_0><seg_1 status=1>This is segment one</seg_1><seg_2 status=0>This is segment two</seg_2>"
我希望将我的字符串中的所有段都放到这样的ArrayList中:
{
{"index":"0","status":"0","seg":"This is segment zero"},
{"index":"1","status":"1","seg":"This is segment one"},
{"index":"2","status":"0","seg":"This is segment two"}
}
如何使用正则表达式?
对此进行归档答案 0 :(得分:3)
此正则表达式提取3组:
<seg_(\d+)\sstatus=(\d+)>(.*?)<\/seg_\1>
Full match 0-44 `<seg_0 status=0>This is segment zero</seg_0>`
Group 1. 5-6 `0` -> index
Group 2. 14-15 `0` -> status
Group 3. 16-36 `This is segment zero` ->segment text
提取字符串使用/<seg_(\d+)\sstatus=(\d+)>(.*?)<\/seg_\1>/g
答案 1 :(得分:1)
您可以尝试使用以下 regex 来捕获所有细分,并通过将其替换为捕获的组来创建数组:
input >> <seg_0 status=0>This is segment zero</seg_0>
<seg_1 status=1>This is segment one</seg_1>
<seg_2 status=0>This is segment two</seg_2>
regex >> <seg_(\d+)[\s\w]+=(\d+)>([\w\s]+)<\/seg_\d+>
replace with >> {"index":"$1","status":"$2","seg":"$3"},
output >> {"index":"0","status":"0","seg":"This is segment zero"},
{"index":"1","status":"1","seg":"This is segment one"},
{"index":"2","status":"0","seg":"This is segment two"},
C# (可能)
using System;
using System.Text.RegularExpressions;
public class RegEx
{
public static void Main()
{
string pattern = @"<seg_(\d+)[\s\w]+=(\d+)>([\w\s]+)<\/seg_\d+>";
string substitution = @"{""index"":""$1"",""status"":""$2"",""seg"":""$3""},";
string input = @"<seg_0 status=0>This is segment zero</seg_0><seg_1 status=1>This is segment one</seg_1><seg_2 status=0>This is segment two</seg_2>";
Regex regex = new Regex(pattern);
string result = regex.Replace(input, substitution);
}
}