Question

我目前正在处理一个内部有RSS feed的XML文档。我想解析它，以便如果找到一个类名为“feedflare”的div标签，代码将删除整个DIV。

我找不到这样做的例子，因为搜索它时会被“HTML编辑器错误”和其他无关数据污染。

这里的任何人都会很友好地分享实现目标的方法吗？

我必须声明我不想使用HtmlAgilityPack 如果可以避免的话。

这是我的过程：

加载XML，解析元素并选择，标题，描述，链接。然后将所有这些保存为HTML（以编程方式添加标记以构建网页），然后在添加所有标记时，我想解析生成的“HTML文本”并删除恼人的DIV标记。

我们假设“string HTML = textBox1.text”，其中textBox1是在解析主XML文档后粘贴结果HTML的地方。

然后我如何遍历textBox1.text的内容并删除名为“feedflare”的div标签（见下文）。

<div class="feedflare">
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:yIl2AUoC8zA">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=yIl2AUoC8zA" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:H0mrP-F8Qgo">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?d=H0mrP-F8Qgo" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:D7DqB2pKExk" border="0"></img></a> 
<a href="http://feeds.gawker.com/~ff/kotaku/full?a=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU">
<img src="http://feeds.feedburner.com/~ff/kotaku/full?i=lB-zYAGjzDU:1zqeSgzxt90:V_sGLiPBpWU" border="0"></img></a>
</div>

提前谢谢。

Answer 1

使用this xml library，执行：

XElement root = XElement.Load(file); // or .Parse(string);
XElement div = root.XPathElement("//div[@class={0}]", "feedflare");
div.Remove();
root.Save(file); // or string = root.ToString();

Answer 2

试试这个

   System.Xml.XmlDocument d = new System.Xml.XmlDocument();
   d.LoadXml(Your_XML_as_String);
    foreach(System.Xml.XmlNode n in d.GetElementsByTagName("div"))
   d.RemoveChild(n);

并使用d.OuterXml检索新的xml。

Answer 3

我在Javascript中的解决方案是：

function unrichText(texto) {
  var n = texto.indexOf("\">"); //Finding end of "<div&nbsp;class="ExternalClass...">
  var sub = texto.substring(0, n+2); //Adding first char and last two (">)
  var tmp = texto.replace(sub, ""); //Removing it
  tmp = replaceAll(tmp, "</div>", ""); //Removing last "div"
  tmp = replaceAll(tmp, "<p>", ""); //Removing other stuff
  tmp = replaceAll(tmp, "</p>", "");
  tmp = replaceAll(tmp, "&#160;", "");
  return (tmp);
}

function replaceAll(str, find, replace) {
    return str.replace(new RegExp(find, 'g'), replace);
}

如果DIV包含某个类名，则从文本文件中删除它

3 个答案: