我想要做的是从字符串中解析一些自定义标记,同时也获取未标记的内容。例如,我有以下字符串
Hello World <Red>This is some red text </Red> This is normal <Blue>This is blue text </Blue>
我使用
获取标记内容的工作正则表达式<(?<tag>\w*)>(?<text>.*)</\k<tag>>
然而,这会返回
tag: Red
text: This is some red text
tag: Blue
text this is blue text
我需要的是获得未标记内容的匹配,所以我会得到4个匹配,两个像上面一样,还有“Hello World”和“This is normal”。
这是正则表达式可以实现的吗?
例如,这是我当前的功能:
public static List<FormattedConsole> FormatColour(string input)
{
List<FormattedConsole> formatted = new List<FormattedConsole>();
Regex regex = new Regex("<(?<Tag>\\w+)>(?<Text>.*?)</\\1>", RegexOptions.IgnoreCase
| RegexOptions.CultureInvariant
| RegexOptions.IgnorePatternWhitespace
| RegexOptions.Compiled
);
MatchCollection ms = regex.Matches(input);
foreach (Match match in ms)
{
GroupCollection groups = match.Groups;
FormattedConsole format = new FormattedConsole(groups["Text"].Value, groups["Tag"].Value);
formatted.Add(format);
}
return formatted;
}
如上所述,这只返回标签之间的匹配。我还需要没有标签的文本。
(顺便说一下,FormattedConsole只是一个包含文字和颜色的容器)
答案 0 :(得分:2)
如果您想尝试修改xml,可以尝试像这样的解决方案。我们将使用Linq。在线试用:https://dotnetfiddle.net/J4zVMY
using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;
public class Program
{
public static void Main()
{
string response = @"Hello World <Red>This is some red text </Red> This is normal <Blue>This is blue text </Blue>";
response = @"<?xml version='1.0' encoding='utf-8'?><root>"+response+"</root>";
var doc = XDocument.Parse(response);
// fill all node in a list of Text
var colors = new List<Text>();
foreach (var hashElement in doc.Descendants().Skip(1).Where(node => !node.IsEmpty))
{
var text = GetText(hashElement.PreviousNode);
if (text != null)
colors.Add(new Text(text));
colors.Add(new Text(hashElement.Value.Trim(), hashElement.Name.ToString()));
}
// handle trailing content
var lastText = GetText(doc.Descendants().Last().NextNode);
if (lastText != null)
colors.Add(new Text(lastText));
// print
foreach (var color in colors)
Console.WriteLine($"{color.Color}: {color.Content}");
}
private static string GetText(XNode node)=> (node as XText)?.Value.Trim();
public class Text
{
public string Content { get; set; }
public string Color { get; set; }
public Text(string content, string color = "Black")
{
Color = color;
Content = content;
}
}
}
输出
Black: Hello World
Red: This is some red text
Black: This is normal
Blue: This is blue text
告诫:欢迎任何帮助。我的Linq-to-xml可能有点生锈。
答案 1 :(得分:2)
你可以试试这个:
string sentence = "Hello World <Red>This is some red text </Red> This is normal <Blue>This is blue text </Blue>";
string[] matchSegments = Regex.Split(sentence,@"(<\w+>)(.*?)<\/\w+>");
foreach (string value in matchSegments)
{
if(value.Contains("<") && value.Contains(">"))
Console.Write(value);
else
Console.WriteLine(value);
}
<强>输出:强>
Hello World
<Red>This is some red text
This is normal
<Blue>This is blue text