网页报废excel VBA

时间:2016-12-06 15:37:18

标签: vba excel-vba web web-scraping excel

我试图从网上刮一张桌子,但由于某种原因,我没有得到整张桌子。它只提取1列而不是全部列。任何帮助将不胜感激!谢谢!

这是我的代码:

Sub HistoricalData()

    Dim xmlHttp As Object
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim row As Long, col As Long

    Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
    xmlHttp.Open "GET", "http://www.cnbc.com/bonds-canada-treasurys", False
    xmlHttp.setRequestHeader "Content-Type", "text/xml"
    xmlHttp.send

    Dim html As Object
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = xmlHttp.responseText

    Dim tbl As Object
    Set tbl = html.getElementById("curr_table")

    row = 1
    col = 1

    Set TR_col = html.getElementsByTagName("TR")
    For Each TR In TR_col
        Set TD_col = TR.getElementsByTagName("TD")
        For Each TD In TD_col
            Cells(row, col) = TD.innerText
            col = col + 1
        Next
        col = 1
        row = row + 1
    Next
End Sub

2 个答案:

答案 0 :(得分:3)

问题是您在页面加载完成之前回到了HTTP.responseText

在返回MSXML2.XMLHTTP.6.0之前,我无法让HTTP.responseText等待页面完成加载,因此我切换到IE

enter image description here

Sub HistoricalData()
    Const URL As String = "http://www.cnbc.com/bonds-canada-treasurys"
    Const READYSTATE_COMPLETE As Integer = 4
    Dim IE As Object
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim row As Long, col As Long

    Set IE = CreateObject("InternetExplorer.Application")

    IE.Navigate URL

    Do While (IE.Busy Or IE.ReadyState <> READYSTATE_COMPLETE)
        DoEvents
    Loop

    Set TR_col = IE.Document.getElementsByTagName("TR")

    For Each TR In TR_col
        Set TD_col = TR.getElementsByTagName("TD")

        For Each TD In TD_col
            Cells(row, col) = TD.innerText
            col = col + 1
        Next
        col = 1
        row = row + 1
    Next
End Sub

答案 1 :(得分:0)

我知道晚了几年,但是这里有一个更加优雅的解决方案恕我直言,它使您可以更好地控制数据,希望有人会发现它有用。

问题是您要请求整个页面,而不仅仅是数据。

对于此解决方案,您将需要导入VBA-JSON并添加对Exe Before: Shared variable set in EXE DLL Before: DLL After: short Exe After: Shared variable set in EXE Length: 5 的引用

Microsoft Scripting Runtime

enter image description here