VBA:使用Control F并从IE网页获取相关数据

时间:2016-06-16 17:09:05

标签: vba excel-vba internet-explorer dom web-scraping

目标:创建一个VBA宏,通过IE打开网页,遍历整个页面,使用 Ctrl + F 函数查找MULTIPLE关键字,如果这些关键字找到,找到这些关键字所在的行,并在该关键字行位置的上方和下方抓取一定数量的行,并将它们发布到Excel工作表以便通过电子邮件发送。

我有代码进入网页,并使用 Ctrl + F 来查找关键字。这部分工作正常。我不知道如何遍历整个网页并为多个关键字执行此操作。我也无法找到每个关键字的行位置' hit'并将其发布到excel(不熟悉VBA)。

Sub Find()

    'create a variable to refer to an IE application, and
    'start up a new copy of IE
    Dim ieApp As New SHDocVw.InternetExplorer
    Dim objectIE As Object

    'make sure you can see
    ieApp.Visible = True

    'go to the website of interest
    ieApp.Navigate "URL HERE"

    'wait for page to finish loading
    Do While ieApp.Busy
    Loop

    'Declare Keywords
    Dim keyword1 As String
    Dim found As Boolean

    keyword1 = "keyword"
    For i = 1 To ie.document.all.Length
        Application.Wait (Now + TimeValue("0:00:02"))
        SendKeys "^f"
        Application.Wait (Now + TimeValue("0:00:01"))
        SendKeys (keyword1)
        Application.Wait (Now + TimeValue("0:00:01"))
        SendKeys ("{ENTER}")
    Next i

End Sub

1 个答案:

答案 0 :(得分:2)

下面是一个示例,它在网页文档文本节点中实现关键字查找(如果找到) - 将范围扩展到整个表格单元格,然后将所有匹配项输出到工作表:

Sub Find()

    sKeyword = "language"
    sUrl = "http://stackoverflow.com/tags"

    Set oList = CreateObject("Scripting.Dictionary")
    With CreateObject("InternetExplorer.Application")

        .Visible = True
        ' Navigating to url
        .Navigate sUrl
        ' Wait for IE ready
        Do While .ReadyState <> 4 Or .Busy
            DoEvents
        Loop
        ' Wait for document complete
        Do While .Document.ReadyState <> "complete"
            DoEvents
        Loop

        '    ' Look up in the specified node - optional
        '    ' Wait for target node created
        '    Do While TypeName(.Document.getElementById("Content")) = "Null" ' replace Content with your Id
        '        DoEvents
        '    Loop
        '    ' Get target node
        '    Set oRoot = .Document.getElementById("Content")

        ' Look up in the entire document
        Set oRoot = .Document.getElementsByTagName("html")(0)
        Set oWalker = .Document.createTreeWalker(oRoot, 4, Null, False) ' NodeFilter.SHOW_TEXT = 4
        Set oNode = oWalker.currentNode
        Do
            Select Case True
                Case IsNull(oNode.NodeValue)
                Case oNode.NodeValue = ""
                Case InStr(oNode.NodeValue, sKeyword) = 0
                Case Else
                    ' Text node contains keyword
                    Debug.Print oNode.NodeValue
                    Do
                        ' Expand the range until thenode of the necessary type is enclosed
                        Set oNode = oNode.ParentNode
                        Debug.Print TypeName(oNode)
                        Select Case TypeName(oNode)
                            '   ' Non-table structures
                            '   Case "HTMLHtmlElement", "HTMLBody", "HTMLDivElement", "HTMLParagraphElement", "HTMLHeadingElement"
                            ' For tables
                            Case "HTMLHtmlElement", "HTMLBody", "HTMLTableRow", "HTMLTableCell"
                                Exit Do
                        End Select
                    Loop
                    ' Add to list
                    sText = oNode.innerText
                    Debug.Print sText
                    oList(oList.Count) = sText
            End Select
            ' Get next node
            oWalker.NextNode
            Set oPrev = oNode
            Set oNode = oWalker.currentNode
        Loop Until oNode Is oPrev
        .Quit

    End With

    ' Results output
    aList = oList.Items()
    Cells(1, 1).Resize(UBound(aList) + 1, 1).Value = aList

End Sub

例如,对于源页面如下

src

输出

output

对于多个关键字搜索,请详细说明规则:执行所有关键字,或者至少有一个关键字应该在一个句子中找到?