刮取本地HTML文件

时间:2018-10-09 14:13:13

标签: excel vba dom

我想打开一个本地HTML文件并将其存储为HTMLDocument,以便将其抓取到excel中。但是,所有可用信息都是针对Web上的html页面的。因此,例如,此代码对www.bbc.co.uk很好用,但不适用于本地文件:

Sub queryXMLlocal()
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument

Debug.Print Application.ActiveWorkbook.Path

XMLPage.Open "GET", "<filepath>\KOND.html", False
XMLPage.send

If XMLPage.Status <> 200 Then
MsgBox "Problem" & vbNewLine & XMLPage.Status & " - " & XMLPage.statusText
Exit Sub
End If

End Sub

或者使用

Sub GetHTMLDocument()

Dim IE As New SHDocVw.internetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument


IE.Visible = True
IE.navigate "https://www.bbc.co.uk/"

Do While IE.readyState <> READYSTATE_COMPLETE
Loop

    ' Wait while IE loading...

Set HTMLDoc = IE.Document
end sub

可以,但是当我使用本地文件时,出现错误:

  

“调用的对象已与其客户端断开连接”

我可以只使用HTMLdocument.open吗?尽管我也无法使它正常工作。

1 个答案:

答案 0 :(得分:1)

这是我通常使用的功能:

Public Function GetHTMLFileContent(ByVal filePath As String) As HTMLDocument
    Dim fso As Object, hFile As Object, hString As String, html As New HTMLDocument
    Set fso = CreateObject("Scripting.FileSystemObject")
    Set hFile = fso.OpenTextFile(filePath)

    Do Until hFile.AtEndOfStream
        hString = hFile.ReadAll()
    Loop

    html.body.innerHTML = hString
    Set GetHTMLFileContent = html
End Function