在C#4.0中使用XmlDocument删除HTML标记

时间:2013-05-06 04:47:24

标签: c#-4.0 xmldocument

我有以下代码,我试图删除所有传递的html元素。

String inputString = "<img class="imgRight" title="Zürich, Switzerland" src="test.png" alt="Switzerland" width="44" height="44"/>
<p class="first">Zurich</p>
<p class="second">Test</p>
<p class="first">Testing</p>
<img class="imgRight" title="Zürich, Switzerland" src="1.png" alt="Switzerland" width="44" height="44"/>
<a href="test.aspx">Hello</a>"; //Sample HTML String

String[] htmlTags = new String[] { "a", "img", "link:ComponentLink" };

String removedTagsHtml = RemoveHTMLTags(inputString,htmlTags);//Giving error "There are multiple root elements." 

public static string RemoveHTMLTags(String inputString, String[] htmlTags)
{
    String strResult = String.Empty;
    foreach (String htmlTag in htmlTags)
    {                
        XmlDocument xDoc = new XmlDocument();
        xDoc.LoadXml(inputString);
        XmlNamespaceManager xMan = new XmlNamespaceManager(xDoc.NameTable);
        xMan.AddNamespace("xs", xDoc.DocumentElement.NamespaceURI);

        XmlNode xNode = xDoc.SelectSingleNode("xs:" + htmlTag + "", xMan);
        xDoc.RemoveAll();
        xDoc.AppendChild(xNode);
        string seeOutputHere = xDoc.OuterXml;

    }
    return strResult;
}

函数生成错误“有多个根元素。”

1 个答案:

答案 0 :(得分:0)

即使您修复了多个根元素&#34; (例如,参见LINQ to XML - Load XML fragments from file),一般情况HTML仍然不是有效的XML。

对于HTML处理,您应该查看HtmlAgilityPack。