使用自定义属性标记抓取库存数据的网站

时间:2018-03-29 11:54:49

标签: html excel-vba web-scraping vba excel

我正在尝试制作一个宏,用于根据股票的ISIN编号从invest.com中搜集股票信息。

到目前为止,我有这个:

Sub Get_Stock_Data()

  Dim Page As New XMLHTTP60
  Dim Doc As New HTMLDocument
  Dim inputbox As IHTMLElement

  Dim Table As IHTMLElement
  Dim Row As IHTMLElement
  Dim cel As IHTMLElement

  Page.Open "get", "https://www.investing.com/", False
  Page.send

  Doc.body.innerHTML = Page.responseText

  Set inputbox = Doc.getElementById("searchTextTop")
  inputbox.Value = "US0378331005"

  Set Table = Doc.getElementsByTagName("table")(1)

  For Each cel In Table.getElementsByTagName("td")
      Debug.Print cel.tagName, cel.className, cel.getAttribute("link")
  Next
End Sub

ISIN号码正在输入网页的主搜索框,其中包含以下内容:

<form onsubmit="" id="combineSearchFormTop" action="/" method="post">    
   <div class="inlineblock" id="searchBoxTop">       
     <input type="text" autocomplete="off" value="EUR/USD or AAPL" 
     default="EUR/USD or AAPL" class="searchText arial_12 lightgrayFont" 
     id="searchTextTop" name="quotes_search_text" prevvalue="">
   </div>
   <label for="searchTextTop" class="searchGlassIcon">&nbsp;</label>
   <i class="cssSpinner"></i>
</form>

有问题的表格是在ISIN号码输入搜索框后生成的自动填充表格。它包含进入库存主页面所需的必要查询字符串。以下是包含所需信息的HTML细分。

<table>
  <tbody>
    <tr data-pair-id="6408" class="row hoverSearch" id="searchRowIdtop_0"> 
      <td class="first flag"><i class="ceFlags USA"></i></td> 
      <td class="second symbolName dirLtr" pairid="6408" id="symbol_AAPL" 
      link="/equities/apple-computer-inc">AAPL</td>                      
      <td class="third" title="Apple Inc">Apple Inc</td>
      <td class="fourth typeExchange" pairid="6408" id="type_6408" 
      link="/equities/apple-computer-inc">Equity - NASDAQ</td>
    </tr>
  </tbody> 
</table>

基本上我想从第二个<td>标签的“link”属性中获取字符串。但是,当我在Excel中运行代码时,立即窗口为“链接”属性返回“null”。

提前致谢。

1 个答案:

答案 0 :(得分:0)

看看下面的例子。 JSON.bas模块导入VBA项目以进行JSON处理。

Option Explicit

Sub Test()

    Dim sJSONString As String
    Dim vJSON
    Dim sState As String
    Dim aData()
    Dim aHeader()

    With CreateObject("MSXML2.XMLHTTP")
        .Open "POST", "https://www.investing.com/search/service/search", False
        .SetRequestHeader "Accept", "application/json"
        .SetRequestHeader "X-Requested-With", "XMLHttpRequest"
        .SetRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; Win64; x64)"
        .SetRequestHeader "Content-Type", "application/x-www-form-urlencoded"
        .send "search_text=US0378331005"
        sJSONString = .responseText
    End With
    JSON.Parse sJSONString, vJSON, sState
    vJSON = vJSON("All")
    JSON.ToArray vJSON, aData, aHeader
    With Sheets(1)
        .Cells.Delete
        .Cells.WrapText = False
        OutputArray .Cells(1, 1), aHeader
        Output2DArray .Cells(2, 1), aData
        .Columns.AutoFit
    End With

End Sub

Sub OutputArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize(1, UBound(aCells) - LBound(aCells) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

Sub Output2DArray(oDstRng As Range, aCells As Variant)

    With oDstRng
        .Parent.Select
        With .Resize( _
                UBound(aCells, 1) - LBound(aCells, 1) + 1, _
                UBound(aCells, 2) - LBound(aCells, 2) + 1)
            .NumberFormat = "@"
            .Value = aCells
        End With
    End With

End Sub

search_text参数设置为US0378331005的输出对我来说如下:

output

顺便说一句,类似的方法适用于以下答案:12345678910111213