当结束标记相同时,如何根据属性替换标记?
示例:
<tag id='bold'><tag id='italic'>Hello World</tag></tag>
到
<b><i>Hello World</i></b>
RegEx可以实现这一点,还是需要编写自定义解析方法?
注意:这是一个简化示例,不适用于HTML浏览器。
答案 0 :(得分:2)
以下是使用XElement
,XPath
和LINQ
执行此操作的方法:
Dim str As String = "<tag id='bold'><tag id='italic'>Hello World</tag></tag>"
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
Dim query = xDoc.XPathSelectElements("//tag")
For Each element In query
If element.HasAttributes = True Then
If element.Attribute("id").Value = "italic" Then
element.Name = "i"
ElseIf element.Attribute("id").Value = "bold" Then
element.Name = "b"
End If
element.RemoveAttributes()
End If
Next element
str = xDoc.ToString(System.Xml.Linq.SaveOptions.DisableFormatting).Replace("<root>", String.Empty).Replace("</root>", String.Empty)
输出:
不要忘记添加这些using
s:
Imports System.Xml.Linq
Imports System.Xml
Imports System.Xml.XPath
答案 1 :(得分:0)
这是可能的,但不是很漂亮。我使用下面的正则表达式模板(源忘记)用于一些简单但不基于标记语言的语法。但它也适用于此。
string NestedRegexTemplate =
@"(?xs) # enable eXtended mode (comments/spaces ignored)
(?<capturedOpen>{0}) # start of tag
(?'value' # named capture
(?> # don't backtrack
(?:
((?!{0})(?!{1}).)+ # not tags
| (?'open' {0} ) # count opening bracket
| (?'close-open' {1} ) # subtract closing bracket (matches only if open count > 0)
)*
)
(?(open)(?!)) # make sure open is not > 0
)
(?<capturedClose>{1}) # end of tag
";
string test = "<tag id='bold'><tag id='italic'>Hello World</tag></tag>";
string regex = string.Format(NestedRegexTemplate, @"<\s*tag(\s[^>]*|)>", @"<\s*/\s*tag\s*>");
var match = Regex.Match(test, regex);
while (match.Success)
{
var capturedOpen = match.Groups["capturedOpen"];
var capturedClose = match.Groups["capturedClose"];
if (capturedOpen.Value.Contains("'bold'"))
{
test = test.Remove(capturedClose.Index, capturedClose.Length);
test = test.Insert(capturedClose.Index, "</b>");
test = test.Remove(capturedOpen.Index, capturedOpen.Length);
test = test.Insert(capturedOpen.Index, "<b>");
}
else if (capturedOpen.Value.Contains("'italic'"))
{
test = test.Remove(capturedClose.Index, capturedClose.Length);
test = test.Insert(capturedClose.Index, "</i>");
test = test.Remove(capturedOpen.Index, capturedOpen.Length);
test = test.Insert(capturedOpen.Index, "<i>");
}
match = Regex.Match(test, regex);
}