使用VBA通过id获取元素时忽略某些标记中的元素

时间:2014-10-29 07:23:31

标签: html excel vba excel-vba

我有一个vba模块,用于提取页面中的所有链接。但是,我想忽略某些标记中的所有链接,例如<header><footer>(及其所有子标记)。谁能告诉我怎么能这样做呢?

Sub Fetch_click()

Dim LinkArr As Variant

Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
IE.Navigate Cells(1, 1).Text
While IE.Busy
DoEvents
Wend

Dim i As Integer
i = 3

Set LinkArr = IE.Document.getElementsByTagName("a")
For Each LinkObj In LinkArr
Cells(i, 1).Value = LinkObj.href
i = i + 1
Next
End Sub

谢谢

1 个答案:

答案 0 :(得分:2)

我更喜欢使用 Microsoft HTML对象库 Microsoft Internet Controls库中的对象(添加对两者的引用!),例如。

Sub StartTest()
Dim Browser As SHDocVw.InternetExplorer
Dim HTMLDoc As MSHTML.HTMLDocument

    ' start browser
    Set Browser = New SHDocVw.InternetExplorer
    Browser.Visible = True
    Browser.navigate "www.dauda.at"
    Set HTMLDoc = Browser.document

Dim ECol As MSHTML.IHTMLElementCollection
Dim IFld As MSHTML.IHTMLElement

    ' search all <a> tags
    Set ECol = HTMLDoc.getElementsByTagName("a")
    For Each IFld In ECol

        ' etc ...

    Next IFld

    ' clean up
    Set IFld = Nothing
    Set ECol = Nothing
    Set HTMLDoc = Nothing
    Browser.Quit
    Set Browser = Nothing
End Sub

检查<a>标记的位置,就像检查IFld.ParentNode.nodeName以获取封闭父级的标记一样简单。

如果不清楚您的<a>的嵌套程度有多深,您可以使用递归函数检查下一个更高的父级,一直到文档根目录("#document")或包含的{ {1}},例如

"HTML"

...所以在Function BadParentRec(TestFld As MSHTML.IHTMLElement) As Boolean Dim MyTag As String, MyTempResult As Boolean BadParentRec = False MyTag = TestFld.ParentNode.nodeName ' Debug.Print MyTag If MyTag = "#document" Then MyTempResult = False ' lowest level is good ElseIf MyTag = "XXX" Then ' your own criteria for bad tags go here MyTempResult = True ' send "bad" back up the recursion chain Else MyTempResult = BadParentRec(TestFld.parentElement) ' next level down End If BadParentRec = MyTempResult End Function 循环中你会说

For Each