Excel VBA提取href值

时间:2018-09-23 15:29:10

标签: html excel vba web-scraping href

我有一个宏,该宏试图从页面中提取所有href值,但似乎只获得第一个。如果有人可以帮助我,将不胜感激。

我使用的网址是https://www.facebook.com/marketplace/vancouver/entertainment

Screenshot of HTML

<div class="_3-98" data-testid="marketplace_home_feed">
  <div>
    <div>
      <div class="_65db">
          <a class="_1oem" href="/marketplace/item/920841554781924" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/580124349088759" data-testid="marketplace_feed_item">
          <a class="_1oem" href="/marketplace/item/1060730340772072" data-testid="marketplace_feed_item">

    Sub Macro1()
``marker = 0
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
    On Error Resume Next    ' sometimes more web pages are counted than are open
    my_url = objShell.Windows(x).document.Location
    my_title = objShell.Windows(x).document.Title

    If my_title Like "Facebook" & "*" Then 'compare to find if the desired web page is already open
        Set ie = objShell.Windows(x)
        marker = 1
        Exit For
    Else
    End If
Next

Set my_data = ie.document.getElementsByClassName("_3-98")
Dim link
i = 1
For Each elem In my_data
    Set link = elem.getElementsByTagName("a")(0)
    i = i + 1

     'copy the data to the excel sheet
    ActiveSheet.Cells(i, 4).Value = link.href

Next

End Sub

2 个答案:

答案 0 :(得分:2)

您可以使用CSS选择器组合来获取元素。如果您提供实际的HTML,而不是提供图像,则将更易于测试和确定最佳组合。通过AppComponent方法应用选择器,以返回所有匹配元素的<router-outlet [routes]="Routes.all"></router-outlet>。您遍历/products/add的{​​{1}},以按从querySelectorAllnodeList的索引访问项目。

VBA:

.Length

css选择器组合为nodeList,它选择具有0类的元素的.Length-1属性。 Dim aNodeList As Object, i As Long Set aNodeList = ie.document.querySelectorAll("._1oem[href]") For i = 0 To aNodeList.Length-1 Activesheet.Cells(i + 2,4) = aNodeList.item(i) Next 是类选择器,._1oem[href]是属性选择器。这是一种fast且健壮的方法。

以上假设没有父href标签可以协商。

与这两个属性(而不是类)匹配的替代选择器将是:

_1oem

完整示例:

"."

答案 1 :(得分:1)

您只需要在每个具有_3-98类的元素中请求第一个锚点元素。遍历父元素内锚元素的集合。

...

dim j as long
Set my_data = ie.document.getElementsByClassName("_65db")

For Each elem In my_data

    for i = 0 to elem.getelementsbytagname("a").count -1

        j = j+1
        ActiveSheet.Cells(j, 4).Value = elem.getElementsByTagName("a")(i).href

    next i

Next elem 

...