我有一个DB,其中包含从MS Word粘贴的一些文本字段,我很难删除和标记,但显然保留了他们的innerText。
我尝试过使用HAP,但我没有朝着正确的方向前进..
Public Function StripHtml(ByVal html As String, ByVal allowHarmlessTags As Boolean) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, False)
Next
Return htmlDoc.DocumentNode.WriteTo()
End Function
此代码只是选择所需的元素并将其删除......但不保留其内部文本..
提前致谢
答案 0 :(得分:1)
嗯......我想我找到了一个解决方案:
Public Function StripHtml(ByVal html As String) As String
Dim htmlDoc As New HtmlDocument()
htmlDoc.LoadHtml(html)
Dim invalidNodes As HtmlNodeCollection = htmlDoc.DocumentNode.SelectNodes("//div|//font|//span|//p")
For Each node In invalidNodes
node.ParentNode.RemoveChild(node, True)
Next
Return htmlDoc.DocumentNode.WriteContentTo
End Function
我差不多......:P