我想使用excel vba从网页中的标题类型中提取属性值。我想从webpage中获取的数据具有以下结构:
<div class="index-detail">
<h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentIdentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contentType="web-page" contentTitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5>
<span class="return-value">917.08 </span>
<span class="daily-change down ">-0.1% ▼ </span>
</div>
&#13;
使用getElementsByClassName
和getElementsByTagName
我已经提取了标题<h5>
,但是当我打印标题的innerText
时,我会DJSI Chile
,但我想在属性contentTitle
中获取Dow Jones Sustainability™ Chile Index (CLP)
。
我该怎么做?
更新
我正在使用的代码如下:
Sub myConSP()
' Declare variables
Dim oHtmlSP As HTMLDocument
Dim tSPIndex As HTMLDivElement
Dim tSPIdx As HTMLDivElement
' Load page inside HTMLDocument
Set oHtmlSP = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", "http://www.espanol.spindices.com", False
.send
oHtmlSP.body.innerHTML = .responseText
End With
' Get indices
Set tSPIndex = oHtmlSP.getElementById("all-indices-slider")
Set objTitleTag = tSPIndex.getElementsByClassName("index-detail")(0).getElementsByTagName("h5")(0)
MsgBox objTitleTag.getAttribute("contentTitle").innerText
End Sub
&#13;
答案 0 :(得分:1)
该属性已附加到<a>
,而不是<h5>
(对不起,这是我在上述评论中的错误):
Sub TT()
Dim html As String, d As New HTMLDocument, el
html = "<div class='index-detail'>" & _
"<h5><a href='/indices/equity/dow-jones-sustainability-chile-index-clp' " & _
"title='DJSI Chile' contentIdentifier='2e9cb165-0cbf-4070-a5ef-dc20bf6219ba' " & _
"contentType = 'web-page' " & _
"contentTitle='Dow Jones Sustainability™ Chile Index (CLP)'>DJSI Chile</a></h5> " & _
"<span class='return-value'>917.08 </span> " & _
"<span class='daily-change down '>-0.1% ? </span></div>"
d.body.innerHTML = html
Set el = d.getElementsByClassName("index-detail")(0).getElementsByTagName("a")(0)
Debug.Print el.getAttribute("contentTitle")
' >>> Dow Jones Sustainability™ Chile Index (CLP)
End Sub
答案 1 :(得分:0)