使用VBA替换Web以在符合条件时返回数据

时间:2019-01-17 15:50:08

标签: excel vba web-scraping

我希望这个问题不在其他帖子中,因为我已经搜索并没有找到答案。我对编程也很陌生,但特别是在抓取网络时。如果你们知道任何好的完整教程,请您指导我。我使用VBA和Python。

我在阅读以下内容后开始工作:Scraping data from website using vba

顺便说一句,非常有帮助。我对旧方法了解得更好,所以我选择了那种方法。

我要搜索的站点是:http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp

我到目前为止编写的代码:

Sub Test()

    Dim ie As Object
    Dim form As Variant, button As Variant
    Set ie = CreateObject("InternetExplorer.Application")
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim xx As Object, x As Object


    With ie
    .Visible = True '< Show browser window
    .navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage

    Do While ie.Busy
        DoEvents
    Loop '< Wait for page to have loaded


    End With

    Set TR_col = ie.Document.getElementsByTagName("TR")

    For Each TR In TR_col
        Set xx = ie.Document.getElementsByTagName("a")
        If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
            Cells(1, 1) = "Ok"
        End If

    Next TR
End Sub

最后,这是检查器的外观:

enter image description here 我还重点介绍了我用于测试目的的信息。

所以,我的方法是搜索所有的“ tr”标签,然后验证表的第一列(我猜这将是第一个“ td”标签)是否等于我要包含的文本一个单元格(在这种情况下,我只是为了测试目的而写了文字)。结果应该是将日期旁边的数字复制到工作表中的单元格中。在这种情况下,我写了“ Ok”只是为了查看if语句是否起作用。但这不是。

我想我不确定如何告诉VBA搜索所有“ tr”标签,搜索每个“ tr”中的所有“ td”标签,找到与某些文本匹配的标签,然后返回第三个“ td” “ tr”中的“”标签。有道理吗?

希望我已经足够具体,有人可以指导我。

1 个答案:

答案 0 :(得分:0)

不必加载整个浏览器即可获取HTML-您可以不用它。

Sub Test()

    '// References required:
    '// 1) Microsoft HTML Object Library
    '// 2) Microsoft XML, v6.0

    Dim req As MSXML2.XMLHTTP60
    Dim doc As MSHTML.HTMLDocument
    Dim tbl As MSHTML.HTMLTable
    Dim tblRow As MSHTML.HTMLTableRow
    Dim tblCell As MSHTML.HTMLTableCell
    Dim anch As MSHTML.HTMLAnchorElement
    Dim html$, url$, sText$, fecha$, valor$, j%

    Set req = New MSXML2.XMLHTTP60
    url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
    Set req = New MSXML2.XMLHTTP60
    req.Open "GET", url, False
    req.send
    html = req.responseText

    Set doc = New MSHTML.HTMLDocument
    doc.body.innerHTML = html

    Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
    For j = 1 To tbl.Rows.Length - 1
        With tbl.Rows(j)
            '// Skip cells without data.
            '// Assume correct data has three cells.
            If .Cells.Length = 3 Then
                Set anch = .Cells(0)
                sText = anch.textContent
                If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
                    fecha = .Cells(1).innerText
                    valor = .Cells(2).innerText
                End If
            End If
        End With
    Next

End Sub