VBA Scrape:从每个html元素中获取href

时间:2016-07-03 18:58:12

标签: vba web-scraping attributes href

下面的代码成功遍历DOM中的每个元素,并将每个元素放在Excel工作表中。 (tagName,ID,className等。)

我的问题是:

如何为每个元素刮取标签属性(标题,href等)? 具体来说,对于“A”标签,我该如何刮取“href”属性?

Enum READYSTATE
    READYSTATE_UNINITIALIZED = 0
    READYSTATE_LOADING = 1
    READYSTATE_LOADED = 2
    READYSTATE_INTERACTIVE = 3
    READYSTATE_COMPLETE = 4
End Enum

Dim ie As InternetExplorer
Dim html As HTMLDocument
Dim RowNumber As Integer
Set ie = New InternetExplorer

ie.Visible = False
ie.navigate "www.somesite.com"

Do While ie.READYSTATE <> READYSTATE_COMPLETE
    Application.StatusBar = "Connecting..."
    DoEvents
Loop

Set html = ie.document

RowNumber = 1
For Each element In html.all
    Cells(RowNumber, "A").Value = element.tagName
    Cells(RowNumber, "B").Value = element.ID
    Cells(RowNumber, "C").Value = element.className
    Cells(RowNumber, "D").Value = element.innerHTML
    RowNumber = RowNumber + 1
Next element

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:2)

RowNumber = RowNumber + 1之前添加此行:

If (element.tagName = "A") Then Cells(RowNumber, "E").Value=element.getAttribute("href")