我有以下代码,我试图删除所有传递的html元素。
String inputString = "<img class="imgRight" title="Zürich, Switzerland" src="test.png" alt="Switzerland" width="44" height="44"/>
<p class="first">Zurich</p>
<p class="second">Test</p>
<p class="first">Testing</p>
<img class="imgRight" title="Zürich, Switzerland" src="1.png" alt="Switzerland" width="44" height="44"/>
<a href="test.aspx">Hello</a>"; //Sample HTML String
String[] htmlTags = new String[] { "a", "img", "link:ComponentLink" };
String removedTagsHtml = RemoveHTMLTags(inputString,htmlTags);//Giving error "There are multiple root elements."
public static string RemoveHTMLTags(String inputString, String[] htmlTags)
{
String strResult = String.Empty;
foreach (String htmlTag in htmlTags)
{
XmlDocument xDoc = new XmlDocument();
xDoc.LoadXml(inputString);
XmlNamespaceManager xMan = new XmlNamespaceManager(xDoc.NameTable);
xMan.AddNamespace("xs", xDoc.DocumentElement.NamespaceURI);
XmlNode xNode = xDoc.SelectSingleNode("xs:" + htmlTag + "", xMan);
xDoc.RemoveAll();
xDoc.AppendChild(xNode);
string seeOutputHere = xDoc.OuterXml;
}
return strResult;
}
函数生成错误“有多个根元素。”
答案 0 :(得分:0)
即使您修复了多个根元素&#34; (例如,参见LINQ to XML - Load XML fragments from file),一般情况HTML仍然不是有效的XML。
对于HTML处理,您应该查看HtmlAgilityPack。