我希望这个问题不在其他帖子中,因为我已经搜索并没有找到答案。我对编程也很陌生,但特别是在抓取网络时。如果你们知道任何好的完整教程,请您指导我。我使用VBA和Python。
我在阅读以下内容后开始工作:Scraping data from website using vba
顺便说一句,非常有帮助。我对旧方法了解得更好,所以我选择了那种方法。
我要搜索的站点是:http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp
我到目前为止编写的代码:
Sub Test()
Dim ie As Object
Dim form As Variant, button As Variant
Set ie = CreateObject("InternetExplorer.Application")
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim xx As Object, x As Object
With ie
.Visible = True '< Show browser window
.navigate ("http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp") '> Travel to homepage
Do While ie.Busy
DoEvents
Loop '< Wait for page to have loaded
End With
Set TR_col = ie.Document.getElementsByTagName("TR")
For Each TR In TR_col
Set xx = ie.Document.getElementsByTagName("a")
If xx = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
Cells(1, 1) = "Ok"
End If
Next TR
End Sub
最后,这是检查器的外观:
所以,我的方法是搜索所有的“ tr”标签,然后验证表的第一列(我猜这将是第一个“ td”标签)是否等于我要包含的文本一个单元格(在这种情况下,我只是为了测试目的而写了文字)。结果应该是将日期旁边的数字复制到工作表中的单元格中。在这种情况下,我写了“ Ok”只是为了查看if语句是否起作用。但这不是。
我想我不确定如何告诉VBA搜索所有“ tr”标签,搜索每个“ tr”中的所有“ td”标签,找到与某些文本匹配的标签,然后返回第三个“ td” “ tr”中的“”标签。有道理吗?
希望我已经足够具体,有人可以指导我。
答案 0 :(得分:0)
不必加载整个浏览器即可获取HTML-您可以不用它。
Sub Test()
'// References required:
'// 1) Microsoft HTML Object Library
'// 2) Microsoft XML, v6.0
Dim req As MSXML2.XMLHTTP60
Dim doc As MSHTML.HTMLDocument
Dim tbl As MSHTML.HTMLTable
Dim tblRow As MSHTML.HTMLTableRow
Dim tblCell As MSHTML.HTMLTableCell
Dim anch As MSHTML.HTMLAnchorElement
Dim html$, url$, sText$, fecha$, valor$, j%
Set req = New MSXML2.XMLHTTP60
url = "http://www.bcra.gob.ar/PublicacionesEstadisticas/Principales_variables.asp"
Set req = New MSXML2.XMLHTTP60
req.Open "GET", url, False
req.send
html = req.responseText
Set doc = New MSHTML.HTMLDocument
doc.body.innerHTML = html
Set tbl = doc.getElementsByClassName("table-BCRA")(, 0)
For j = 1 To tbl.Rows.Length - 1
With tbl.Rows(j)
'// Skip cells without data.
'// Assume correct data has three cells.
If .Cells.Length = 3 Then
Set anch = .Cells(0)
sText = anch.textContent
If sText = "Base Monetaria - Promedio acumulado del mes (MM de $)" Then
fecha = .Cells(1).innerText
valor = .Cells(2).innerText
End If
End If
End With
Next
End Sub