Question

如何在VB6中使用MSHTML Parser去除所有HTML标记？

Answer 1

这是改编自CodeGuru的Code over。很多非常感谢原作者： http://www.codeguru.com/vb/vb_internet/html/article.php/c4815

如果您需要从网上下载HTML，请检查原始来源。 E.g：

Set objDocument = objMSHTML.createDocumentFromUrl("http://google.com", vbNullString)

我不需要从网上下载HTML存根 - 我已经在内存中存储了我的存根。所以原始来源并不适用于我。我的主要目标是为我提供一个合格的DOM Parser剥离用户生成的内容中的HTML。有人会说，“为什么不使用一些RegEx剥离HTML？”祝你好运！

添加对以下内容的引用：Microsoft HTML Object Library

这是运行Internet Explorer（IE）的HTML Parser - 让heckling开始吧。好吧，哎呀......

这是我使用的代码：

Dim objDocument As MSHTML.HTMLDocument
Set objDocument = New MSHTML.HTMLDocument

'NOTE: txtSource is an instance of a simple TextBox object
objDocument.body.innerHTML = "<p>Hello World!</p> <p>Hello Jason!</p> <br/>Hello Bob!"
txtSource.Text = objDocument.body.innerText

txtSource.Text 中的结果文本是我删除了所有HTML的用户内容。清洁和可维护 - 对我来说没有Cthulhu方式。

Answer 2

Public Function ParseHtml(ByVal str As String) As String
    Dim Ret As String, TagOpenend As Boolean, TagClosed As Boolean
    Dim n As Long, sChar As String
    For n = 1 To Len(str)
        sChar = Mid(str, n, 1)
        Select Case sChar
            Case "<"
                TagOpenend = True
            Case ">"
                TagClosed = True
                TagOpenend = False
            Case Else
                If TagOpenend = False Then
                    Ret = Ret & sChar
                End If
        End Select
    Next
    ParseHtml = Ret
End Function

这是一个我自己使用的简单功能。 使用调试窗口

？ParseHtml（“＆lt; div＆gt; test＆lt; / div＆gt;”）

测试

我希望在没有使用外部库的情况下这会有所帮助

Answer 3

一种方式：

Function strip(html As String) As String
    With CreateObject("htmlfile")
        .Open
        .write html
        .Close
        strip = .body.outerText
    End With
End Function

有关

?strip("<strong>hello <i>wor<u>ld</u>!</strong><foo> 1234")
hello world! 1234

如何在VB6中使用MSHTML Parser去除所有HTML标记？

3 个答案: