似乎没有关于codeplex页面的文档,出于某种原因,intellisense没有向我展示htmlagilitypack的可用方法或任何内容(例如当我输入MyHtmlDocument.DocumentNode时。 - 没有intellisense告诉我什么我可以做下一步)
我需要知道如何删除所有< a>标签及其来自HTML文档正文的内容我不能只在Body上使用Node.InnerText,因为它仍然会从A标签返回内容。
以下是HTML示例
<html>
<body>
I was born in <a name=BC>Toronto</a> and now I live in barrie
</body>
</html>
我需要返回
I was born in and now I live in barrie
谢谢,我感谢您的帮助!
托马斯
答案 0 :(得分:1)
有些事情(对不起我的代码是C#,但我希望它会有所帮助)
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml("some html markup here");
HtmlNodeCollection links = doc.DocumentNode.SelectNodes("//a[@name]");
foreach(HtmlNode link in links)
{
link.Remove();
}
//then one of the many doc.Save(...) overrides to actually get the result of the operation.
答案 1 :(得分:0)
这可以为您提供所需的结果。这使用递归方法深入查看所有html节点,您可以通过添加新的if语句来删除更多节点。
Public Sub Test()
Dim document = New HtmlDocument() With { _
Key .OptionOutputAsXml = True _
}
document.LoadHtml("<html><body>I was born in <a name=BC>Toronto</a> and now I live in barrie</body></html>")
For i As var = 0 To document.DocumentNode.ChildNodes.Count - 1
RecursiveMethod(document.DocumentNode.ChildNodes(i))
Next
Console.Out.WriteLine(document.DocumentNode.InnerHtml.Replace(" ", " "))
End Sub
Public Sub RecursiveMethod(child As HtmlNode)
For x As var = 0 To child.ChildNodes.Count - 1
Dim node = child.ChildNodes(x)
If node.Name = "a" Then
node.RemoveAll() //removes all the child nodes of "a"
node.Remove() //removes the actual "a" node
Else
If node.HasChildNodes Then
RecursiveMethod(node)
End If
End If
Next
End Sub