为了研究目的,我一直试图抓住以下页面:http://www.brazil4export.com/en/pesquisa/resultado/?page=1&
我想从中获取信息的HTML是:
<div class="panel panel-default">
<div class="panel-heading" data-activity="22196 - Manufacturer" data-products='["Products", "Information"]' data-range="Value" data-contact="Person" data-site="www.website.com.br" data-emails="name@example.com" data-phones="Phone" data-address="Street / City" data-countries='["Country1", "Country2"]' data-name="ACME Corp.">
<h3 class="panel-title">
<button class="btn btn-link" data-toggle="modal" data-target="#company-modal">
ACME Corp.
</button>
</h3>
<button class="btn btn-primary btn-lg pull-right" data-toggle="modal" data-target="#company-modal">
<i class="icon-plus"></i>
</button>
</div>
</div>
对于页面上的每个结果,都有<div class="panel panel-default">
,就像上面一样,我希望从每个结果中获取data-name
和data-site
信息。到目前为止,这是我尝试过的:
Sub useClassnames()
Dim element As IHTMLElement
Dim elements As IHTMLElementCollection
Dim ie As InternetExplorer
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate "http://www.brazil4export.com/en/pesquisa/resultado/?page=1&"
'Wait until IE has loaded the web page
Do While ie.READYSTATE <> READYSTATE_COMPLETE
Application.StatusBar = "Loading Web page …"
DoEvents
Loop
Set html = ie.document
Set elements = html.getElementsByClassName("panel panel-default")
Dim erow As Long
For Each element In elements
If element.className = "data-name" Then
erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 2) = html.getElementsByClassName("data-name").innerText
End If
If element.className = "data-site" Then
erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 3) = html.getElementsByClassName("data-site").innerText
End If
Next element
End Sub
它不起作用,但也没有向我显示任何错误。
答案 0 :(得分:1)
运行此选项,您将获得所有结果:
Sub WebData()
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim source As Object
With http
.Open "GET", "http://www.brazil4export.com/en/pesquisa/resultado/?page=1&", False
.send
html.body.innerHTML = .responseText
End With
For Each source In html.getElementsByClassName("panel-heading")
x = x + 1: Cells(x, 1) = source.getAttribute("data-Name")
Cells(x, 2) = source.getAttribute("data-site")
Next source
End Sub
确保将“Microsoft Html Object Library”和“Microsoft xml”添加到参考库中。见结果图: