使用Excel VBA刮取HTML

时间:2015-12-04 23:02:40

标签: html excel vba excel-vba web-scraping

我一直在尝试搜索和解析网站上的一些财务数据,以便我可以使用VBA将数据添加到Excel电子表格中。我找到了几种可能的解决方案,但我似乎无法让它们适合我的参数。我的问题是我只需要一个表中的一个变量(平均目标价格)。我无法弄清楚我做错了什么。我也将使用类似的VBA格式一次检查几百家公司,所以如果有更有效的方式对我所拥有的内容进行编码,请告诉我。

这是我到目前为止所做的:

Sub ImportAnalystEst()

Dim oHtml       As HTMLDocument
Dim oElement    As IHTMLElement

Set oHtml = New HTMLDocument

With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
    .send
    oHtml.body.innerHTML = .responseText
End With

Dim wsTarget As Worksheet
Dim i As Integer
i = 1
Set wsTarget = ActiveWorkbook.Worksheets("Sheet1")

For Each oElement In oHtml.getElementsByClassName("snapshot")
  wsTarget.Range("A" & i) = Split(oElement.Children(0).innerText, "<TD>")
  i = i + 1
Next

End Sub

这是我想要提取的HTML。有人可以举例说明我如何能够提取146.52的平均目标价格吗?

<div class="analystEstimates">

<div class="block">
    <h2>Snapshot</h2>
</div>
<table class="snapshot">
    <tbody>
        <tr>
            <td class="first">Average Recommendation:</td>
            <td class="recommendation">
                Overweight
            </td>
            <td class="first column2">Average Target Price:</td>
            <td>146.52</td>
        </tr>
        <tr>
            <td class="first">Number of Ratings:</td>
            <td>

3 个答案:

答案 0 :(得分:1)

我能够通过以下方式解决我的问题:

Sub ImportAnalystEst()
Dim oHtml       As HTMLDocument
Dim oElement    As IHTMLElement

Set oHtml = New HTMLDocument


With CreateObject("WINHTTP.WinHTTPRequest.5.1")
    .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
    .send
    oHtml.body.innerHTML = .responseText
End With

Dim wsTarget As Worksheet
Dim i As Integer
i = 1
Set wsTarget = ActiveWorkbook.Worksheets("Sheet1")


For Each oElement In oHtml.getElementsByClassName("snapshot")
  wsTarget.Range("A" & i) = Split(oHtml.getElementsByClassName("snapshot").Item(0).FirstChild.FirstChild.innerHTML, "TD")(7)
  wsTarget.Range("A" & i) = Replace(wsTarget.Range("A" & i), ">", "")
  wsTarget.Range("A" & i) = Replace(wsTarget.Range("A" & i), "</", "")
  i = i + 1
Next


End Sub

答案 1 :(得分:1)

更容易使用CSS selector组合将值作为表第二列中第一行表单元格的位置来定位值。 CSS选择器是.snapshot .first.column2 + td,它使用"."类选择器," "后代组合器和"+"相邻的兄弟组合器。

Option Explicit
Public Sub ImportAnalystEst()
    Dim oHtml       As HTMLDocument
    Dim oElement    As IHTMLElement

    Set oHtml = New HTMLDocument

    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.marketwatch.com/investing/stock/aapl/analystestimates", False
        .send
        oHtml.body.innerHTML = .responseText
    End With
    Debug.Print oHtml.querySelector(".snapshot .first.column2 + td").innertext
End Sub

答案 2 :(得分:0)

这将做你想要的。

{{1}}