Excel:查询HTML标题中的属性

时间:2016-01-13 17:41:42

标签: html excel vba excel-vba web-scraping

我想使用excel vba从网页中的标题类型中提取属性值。我想从webpage中获取的数据具有以下结构:



<div class="index-detail">
  <h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentIdentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contentType="web-page" contentTitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5>
  <span class="return-value">917.08 </span>
  <span class="daily-change  down ">-0.1% ▼ </span>
</div>
&#13;
&#13;
&#13;

使用getElementsByClassNamegetElementsByTagName我已经提取了标题<h5>,但是当我打印标题的innerText时,我会DJSI Chile ,但我想在属性contentTitle中获取Dow Jones Sustainability™ Chile Index (CLP)

我该怎么做?

更新

我正在使用的代码如下:

&#13;
&#13;
Sub myConSP()
    
    ' Declare variables
    Dim oHtmlSP As HTMLDocument
    Dim tSPIndex As HTMLDivElement
    Dim tSPIdx As HTMLDivElement

    ' Load page inside HTMLDocument
    Set oHtmlSP = New HTMLDocument
    With CreateObject("WINHTTP.WinHTTPRequest.5.1")
        .Open "GET", "http://www.espanol.spindices.com", False
        .send
        oHtmlSP.body.innerHTML = .responseText
    End With

    ' Get indices
    Set tSPIndex = oHtmlSP.getElementById("all-indices-slider")

    Set objTitleTag = tSPIndex.getElementsByClassName("index-detail")(0).getElementsByTagName("h5")(0)
    MsgBox objTitleTag.getAttribute("contentTitle").innerText

End Sub
&#13;
&#13;
&#13;

2 个答案:

答案 0 :(得分:1)

该属性已附加到<a>,而不是<h5>(对不起,这是我在上述评论中的错误):

Sub TT()

    Dim html As String, d As New HTMLDocument, el

    html = "<div class='index-detail'>" & _
    "<h5><a href='/indices/equity/dow-jones-sustainability-chile-index-clp' " & _
    "title='DJSI Chile' contentIdentifier='2e9cb165-0cbf-4070-a5ef-dc20bf6219ba' " & _
    "contentType = 'web-page' " & _
    "contentTitle='Dow Jones Sustainability™ Chile Index (CLP)'>DJSI Chile</a></h5> " & _
    "<span class='return-value'>917.08 </span> " & _
    "<span class='daily-change  down '>-0.1% ? </span></div>"

    d.body.innerHTML = html

    Set el = d.getElementsByClassName("index-detail")(0).getElementsByTagName("a")(0)

    Debug.Print el.getAttribute("contentTitle")
      ' >>> Dow Jones Sustainability™ Chile Index (CLP)


End Sub

答案 1 :(得分:0)

CSS选择器:

您可以使用a [contentTitle]的css选择器来获取带有a标签和contenttitle属性的元素。然后,您访问contenttitle属性。


CSS查询:

选择适当的元素

CSS query


VBA:

您应用选择器以使用querySelector获得单个节点;所有匹配的节点将是querySelectorAll。您可以使用getAttribute访问所需的信息。

oHtmlSP.querySelector("a[contentTitle]").getAttribute("contentTitle")