从div,class和span元素进行Web抓取

时间:2015-12-17 22:19:17

标签: html excel vba excel-vba web-scraping

我想从S&P Down Jones Indices web site中提取数据。相关数据嵌入在此代码中:



<div class="indices-detail-container">
  <div id="all-indices-slider" class="slides" style="float: none; position: absolute; top: 0px; left: -5px; margin: 0px; width: 6318px; height: 113px;">

   <div class="index-detail">
     <h5><a href="/indices/equity/dow-jones-sustainability-chile-index-clp" title="DJSI Chile" contentidentifier="2e9cb165-0cbf-4070-a5ef-dc20bf6219ba" contenttype="web-page" contenttitle="Dow Jones Sustainability™ Chile Index (CLP)">DJSI Chile</a></h5>
     <span class="return-value">943.76 </span>
     <span class="daily-change  up ">0.07% ▲</span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-bvl-peru-general-index-pen" title="S&amp;P/BVL Peru General Index (PEN)" contentidentifier="cec2fa99-13f9-4bf5-9770-4832d86dc017" contenttype="web-page" contenttitle="S&amp;P/BVL Peru General Index (PEN)">S&amp;P/BVL Peru General Index ...</a></h5>
     <span class="return-value">9,922.82 </span>
     <span class="daily-change  down ">-0.04% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-bvl-peru-select-index" title="S&amp;P/BVL Peru Select Index" contentidentifier="162ea564-b038-493c-a3bc-5f56bda60bb4" contenttype="web-page" contenttitle="S&amp;P/BVL Peru Select Index">S&amp;P/BVL Peru Select Index</a></h5>
     <span class="return-value">188.02 </span>
     <span class="daily-change  up "> 0.18% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/equity/sp-bvl-lima-25-index-pen" title="S&amp;P/BVL LIMA 25 Index (PEN)" contentidentifier="12f6a899-f5f6-4c6f-9a82-9db3da8d2821" contenttype="web-page" contenttitle="S&amp;P/BVL LIMA 25 Index (PEN)">S&amp;P/BVL LIMA 25 Index (PEN)</a></h5>
     <span class="return-value">13,153.1 </span>
     <span class="daily-change  down "> -0.3% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-bvl-mining-index-pen" title="S&amp;P/BVL Mining Index (PEN)" contentidentifier="2bef26d1-5720-457f-838a-761a176b06a6" contenttype="web-page" contenttitle="S&amp;P/BVL Mining Index (PEN)">S&amp;P/BVL Mining Index (PEN)</a></h5>
     <span class="return-value">117.81 </span>
     <span class="daily-change  up "> 1.15% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-lac-40-us" title="S&amp;P Latin America 40" contentidentifier="41ac7d89-a7d8-49d7-8d15-ff9bbc22a17a" contenttype="web-page" contenttitle="S&amp;P Latin America 40">S&amp;P Latin America 40</a></h5>
     <span class="return-value">2,213.49 </span>
     <span class="daily-change  down "> -0.49% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-cetes-index" title="S&amp;P/Valmer Mexico Government CETES Index" contentidentifier="d1973dbe-ce5e-4757-b5d5-face93abbb7c" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government CETES Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">201.36 </span>
     <span class="daily-change  up "> 0.01% ▲ </span>
   </div>

   <div class="index-detail last no-bottom-border">
     <h5><a href="/indices/equity/sp-mila-andean-40-index" title="S&amp;P MILA Andean 40" contentidentifier="b5374c9e-85b3-44c1-a37e-dd1f8d3abb1b" contenttype="web-page" contenttitle="S&amp;P MILA Andean 40">S&amp;P MILA Andean 40</a></h5>
     <span class="return-value">439.28 </span>
     <span class="daily-change  up "> 0.41% ▲ </span>
   </div>
  </div>

  <div class="index-slide" style="margin-right: 5px;">

   <div class="index-detail">
     <h5><a href="/indices/commodities/dow-jones-commodity-index" title="DJCI" contentidentifier="338b4dbf-d7eb-470b-9b17-8c713c4612ab" contenttype="web-page" contenttitle="Dow Jones Commodity Index">DJCI</a></h5>
     <span class="return-value">234.06 </span>
     <span class="daily-change  down "> -1.05% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-500" title="S&amp;P 500" contentidentifier="725e00f8-85c7-4fef-87f6-1c11be7f6517" contenttype="web-page" contenttitle="S&amp;P 500®">S&amp;P 500</a></h5>
     <span class="return-value">2,051.35 </span>
     <span class="daily-change  down "> -1.05% ▼ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-composite" title="S&amp;P MILA Pacific Alliance Composite" contentidentifier="3baf0ead-3784-4daf-9333-2f32470ddb4e" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Composite">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">349.36 </span>
     <span class="daily-change  up "> 1.54% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/commodities/sp-gsci" title="S&amp;P GSCI" contentidentifier="dd11d7c8-0c9b-492c-8242-1017e4d41c29" contenttype="web-page" contenttitle="S&amp;P GSCI">S&amp;P GSCI</a></h5>
     <span class="return-value">2,121.09 </span>
     <span class="daily-change  down ">-1.03% ▼ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-bmi-us-dollar" title="S&amp;P Latin America BMI" contentidentifier="c9ba7da8-4dcb-4a7d-9a81-ae8497a9f1db" contenttype="web-page" contenttitle="S&amp;P Latin America BMI">S&amp;P Latin America BMI</a></h5>
     <span class="return-value">189.48 </span>
     <span class="daily-change  up "> 1.08% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-ifci-latin-america-price-index-in-us-dollar" title="S&amp;P/IFCI Latin America" contentidentifier="b22825f7-d873-4e96-818d-28036f7dba27" contenttype="web-page" contenttitle="S&amp;P/IFCI Latin America">S&amp;P/IFCI Latin America</a></h5>
     <span class="return-value">1,228.91 </span>
     <span class="daily-change  up "> 0.35% ▲ </span>
   </div>

   <div class="index-detail no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-infrastructure-index" title="S&amp;P Latin America Infrastructure" contentidentifier="b3751332-cf1c-4fb2-8e46-733932ed6989" contenttype="web-page" contenttitle="S&amp;P Latin America Infrastructure Index">S&amp;P Latin America ...</a></h5>
     <span class="return-value">1,055.92 </span>
     <span class="daily-change  up "> 2.79% ▲ </span>
   </div>

   <div class="index-detail last no-bottom-border">
     <h5><a href="/indices/equity/sp-latin-america-adr-index" title="S&amp;P Latin America ADR" contentidentifier="91c85053-fa63-448b-9b5f-7f34f0afa964" contenttype="web-page" contenttitle="S&amp;P Latin America ADR Index">S&amp;P Latin America ADR</a></h5>
     <span class="return-value">205.43 </span>
     <span class="daily-change  up "> 2.17% ▲ </span>
   </div>
  </div>

  <div class="index-slide" style="margin-right: 5px;">

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-select" title="S&amp;P MILA Pacific Alliance Select" contentidentifier="5da0480d-e00f-4dd6-a99f-8c9d01dfe859" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Select">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">3,842.7 </span>
     <span class="daily-change  up "> 1.63% ▲ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/equity/sp-mila-pacific-alliance-completion-index" title="S&amp;P MILA Pacific Alliance Completion" contentidentifier="cb45f262-959e-4eab-b9e2-abddd4efc6e6" contenttype="web-page" contenttitle="S&amp;P MILA Pacific Alliance Completion">S&amp;P MILA Pacific Alliance ...</a></h5>
     <span class="return-value">477.39 </span>
     <span class="daily-change  up "> 1.45% ▲ </span>
   </div>

   <div class="index-detail">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-international-1-year-ums-index" title="S&amp;P/Valmer Mexico Government International 1+ Year UMS Index" contentidentifier="6b29ea9c-3a43-4c09-94b5-a5fe93fac9b4" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government International 1+ Year UMS Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">327.07 </span>
     <span class="daily-change  up "> 0.12% ▲ </span>
   </div>

   <div class="index-detail last">
     <h5><a href="/indices/fixed-income/sp-valmer-mexico-government-1-5-year-mbonos-index" title="S&amp;P/Valmer Mexico Government 1-5 Year MBONOS Index" contentidentifier="16d4060c-3a31-4efa-8c57-239c679bb779" contenttype="web-page" contenttitle="S&amp;P/Valmer Mexico Government 1-5 Year MBONOS Index">S&amp;P/Valmer Mexico ...</a></h5>
     <span class="return-value">244.56 </span>
     <span class="daily-change  up "> 0.05% ▲ </span>
   </div>
  </div>
</div>
&#13;
&#13;
&#13;

有一个很大的部分封装了索引数据,class定义为indices-detail-container。在此部分中,有三个子部分,一个使用class all-indices-slider定义,最后两个使用class index-slide定义。我想要提取的数据在这三个子部分中,包含在:

<div class="index-detail">
    ...
</div>

具体来说,我想要content title类中的return-valueindex-detail。例如,对于第一个项目我喜欢:

  

Title =&#34;道琼斯Sustainability™智利指数&#34;或&#34; DJSI智利&#34;

     

值= 943.76

我原以为我可以在标题元素contentidentifier中使用<h5>标记,但是我不知道如何调用标记来区分索引。

到目前为止,我有:

Sub Dow_HistoricalData()

    Dim xmlHttp As Object
    Dim TR_col As Object, TR As Object
    Dim TD_col As Object, TD As Object
    Dim row As Long, col As Long

    Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
    xmlHttp.Open "GET", "http://www.espanol.spindices.com/", False
    xmlHttp.setRequestHeader "Content-Type", "text/xml"
    xmlHttp.send

    Dim html As Object
    Set html = CreateObject("htmlfile")
    html.body.innerHTML = xmlHttp.responseText

    Dim tbl As Object
    Set tbl = html.getElementById("all-indices-slider")

End Sub

1 个答案:

答案 0 :(得分:0)

使用CSS selectors

非常容易

你已经很好地解释了你的目标:

  

我想要索引详细信息中的内容标题和返回值   类

return-value是一个类,所以你可以这样做:

.index-detail .return-value

"." stands for className" ." means classNames within preceeding即获取.index-detail classNames中包含的所有返回值className元素。

对于显示的HTML,您可以缩写为.return-value

contenttitle是一个属性,需要稍微不同的语法来选择:

.index-detail [contenttitle]

可以缩写,对于显示的HTML:[contenttitle]

以下是两个选择器的示例视图:

.return-value

sample

[contenttitle]

Sample

<强> VBA:

那么,这如何转化为VBA?好吧,.document有一个querySelectorAll()方法。您使用html变量创建了此实例,并使用

填充它
html.body.innerHTML = xmlHttp.responseText

假设这返回了您需要的HTML,那么您只需使用:

Dim contentTitles As Object, returns As Object
Set contentTitles = html.querySelectorAll("[contenttitle]")
Set returns = html.querySelectorAll(".return-value")

Dim currentNode As Long
For currentNode = 0 To contentTitles.Length - 1
    Debug.Print contentTitles(currentNode).innerText
    'Debug.Print contentTitles.item(currentNode).innerText '<==Or potentially this syntax
    Debug.Print returns(currentNode).innerText
    'Debug.Print returns.item(currentNode).innerText '<==Or potentially this syntax
Next currentNode

注意:

返回的对象为static nodeLists。匹配项的集合。您遍历这些匹配的长度(0到19个索引),并通过.innerText属性访问文本。