.NET正则表达式基于具有相同结束标记的属性替换开始和结束标记

时间:2015-07-09 10:08:51

标签: .net regex vb.net

当结束标记相同时,如何根据属性替换标记?

示例:

<tag id='bold'><tag id='italic'>Hello World</tag></tag>

<b><i>Hello World</i></b>

RegEx可以实现这一点,还是需要编写自定义解析方法?

注意:这是一个简化示例,不适用于HTML浏览器。

2 个答案:

答案 0 :(得分:2)

以下是使用XElementXPathLINQ执行此操作的方法:

Dim str As String = "<tag id='bold'><tag id='italic'>Hello World</tag></tag>"
Dim xDoc As XDocument = XDocument.Parse("<?xml version= '1.0'?><root>" + str + "</root>")
Dim query = xDoc.XPathSelectElements("//tag")
For Each element In query
   If element.HasAttributes = True Then
        If element.Attribute("id").Value = "italic" Then
           element.Name = "i"
        ElseIf element.Attribute("id").Value = "bold" Then
           element.Name = "b"
        End If
        element.RemoveAttributes()
    End If
Next element
str = xDoc.ToString(System.Xml.Linq.SaveOptions.DisableFormatting).Replace("<root>", String.Empty).Replace("</root>", String.Empty)

输出:

enter image description here

不要忘记添加这些using s:

Imports System.Xml.Linq
Imports System.Xml
Imports System.Xml.XPath

答案 1 :(得分:0)

这是可能的,但不是很漂亮。我使用下面的正则表达式模板(源忘记)用于一些简单但不基于标记语言的语法。但它也适用于此。

         string NestedRegexTemplate =
            @"(?xs)                      # enable eXtended mode (comments/spaces ignored)
                (?<capturedOpen>{0})                      # start of tag
                (?'value'                # named capture
                  (?>                    # don't backtrack
                    (?:
                      ((?!{0})(?!{1}).)+             # not tags
                    | (?'open' {0} )       # count opening bracket
                    | (?'close-open' {1} ) # subtract closing bracket (matches only if open count > 0)
                    )*
                  )
                  (?(open)(?!))          # make sure open is not > 0
                )
                (?<capturedClose>{1})                        # end of tag
            ";

        string test = "<tag id='bold'><tag id='italic'>Hello World</tag></tag>";
        string regex = string.Format(NestedRegexTemplate, @"<\s*tag(\s[^>]*|)>", @"<\s*/\s*tag\s*>");
        var match = Regex.Match(test, regex);
        while (match.Success)
        {
            var capturedOpen = match.Groups["capturedOpen"];
            var capturedClose = match.Groups["capturedClose"];

            if (capturedOpen.Value.Contains("'bold'"))
            {
                test = test.Remove(capturedClose.Index, capturedClose.Length);
                test = test.Insert(capturedClose.Index, "</b>");
                test = test.Remove(capturedOpen.Index, capturedOpen.Length);
                test = test.Insert(capturedOpen.Index, "<b>");
            }
            else if (capturedOpen.Value.Contains("'italic'"))
            {
                test = test.Remove(capturedClose.Index, capturedClose.Length);
                test = test.Insert(capturedClose.Index, "</i>");
                test = test.Remove(capturedOpen.Index, capturedOpen.Length);
                test = test.Insert(capturedOpen.Index, "<i>");
            }

            match = Regex.Match(test, regex);
        }