使用VBA从网站提取div类信息

时间:2017-09-04 11:46:30

标签: html vba excel-vba web-scraping excel

为了研究目的,我一直试图抓住以下页面:http://www.brazil4export.com/en/pesquisa/resultado/?page=1&

我想从中获取信息的HTML是:

<div class="panel panel-default">
  <div class="panel-heading" data-activity="22196 - Manufacturer" data-products='["Products", "Information"]' data-range="Value" data-contact="Person" data-site="www.website.com.br" data-emails="name@example.com" data-phones="Phone" data-address="Street / City" data-countries='["Country1", "Country2"]' data-name="ACME Corp.">
    <h3 class="panel-title">
      <button class="btn btn-link" data-toggle="modal" data-target="#company-modal">
        ACME Corp.
      </button>
    </h3>
    <button class="btn btn-primary btn-lg pull-right" data-toggle="modal" data-target="#company-modal">
      <i class="icon-plus"></i>
    </button>
  </div>
</div>

对于页面上的每个结果,都有<div class="panel panel-default">,就像上面一样,我希望从每个结果中获取data-namedata-site信息。到目前为止,这是我尝试过的:

Sub useClassnames()
  Dim element As IHTMLElement
  Dim elements As IHTMLElementCollection
  Dim ie As InternetExplorer
  Dim html As HTMLDocument

  'open Internet Explorer in memory, and go to website
  Set ie = New InternetExplorer

  ie.Visible = True
  ie.navigate "http://www.brazil4export.com/en/pesquisa/resultado/?page=1&"
  'Wait until IE has loaded the web page

  Do While ie.READYSTATE <> READYSTATE_COMPLETE
    Application.StatusBar = "Loading Web page …"
    DoEvents
  Loop

  Set html = ie.document
  Set elements = html.getElementsByClassName("panel panel-default")

  Dim erow As Long

  For Each element In elements
    If element.className = "data-name" Then
      erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
      Cells(erow, 2) = html.getElementsByClassName("data-name").innerText
    End If

    If element.className = "data-site" Then
      erow = Sheet1.Cells(Rows.count, 1).End(xlUp).Offset(1, 0).Row
      Cells(erow, 3) = html.getElementsByClassName("data-site").innerText
    End If
  Next element

End Sub

它不起作用,但也没有向我显示任何错误。

1 个答案:

答案 0 :(得分:1)

运行此选项,您将获得所有结果:

Sub WebData()
    Dim http As New XMLHTTP60, html As New HTMLDocument
    Dim source As Object

    With http
        .Open "GET", "http://www.brazil4export.com/en/pesquisa/resultado/?page=1&", False
        .send
        html.body.innerHTML = .responseText
    End With
    For Each source In html.getElementsByClassName("panel-heading")
        x = x + 1: Cells(x, 1) = source.getAttribute("data-Name")
        Cells(x, 2) = source.getAttribute("data-site")
    Next source
End Sub

确保将“Microsoft Html Object Library”和“Microsoft xml”添加到参考库中。见结果图:

enter image description here