带有Excel Internet Explorer的VBA不使用getelemtsbytagname新打开的选项卡

时间:2018-07-27 18:54:55

标签: vba excel-vba internet-explorer dom web-scraping

我已经在Microsoft Excel中创建了VBA代码。 我想要的代码是:

  1. 转到站点。
  2. 单击网站上的链接。
  3. 转到新打开的标签,然后在新标签上单击下载。
  4. 然后返回到根目录/首页,然后单击下一页,然后单击下载。
  5. 重复操作,直到单击并下载所有根页面链接。

我有下面的代码,这种工作方式。它使用Internet Explorer,因为该网站无法与google chrome一起正常运行,但我会尝试一下,如果效果更好。

代码确实导航,没有打开选项卡...该代码不执行的操作是开始使用新页面/选项卡中的数据,所有元素均来自根页面。我尝试通过item(1)选择一个新的IE实例。我不确定要使instancehyperlinks引用新创建的选项卡需要做什么。仅供参考,由于根网站的编写方式,数据显示在新标签中,我无法控制任何HTML。

代码如下:

Sub getalllinks()

    Dim ie As Object

    Set ie = CreateObject("InternetExplorer.Application")
    ie.Visible = True
    url_name = "123.123.123.123"
    ie.navigate url_name
    Do
        DoEvents
    Loop Until ie.readystate = 4 ' wait until fully loaded
    Set allhyperlinks = ie.document.getelementsbytagname("A")
    For Each hyper_link In allhyperlinks
        If hyper_link.Title = "View Subject" Then
            hyper_link.Click
            Do
                DoEvents
            Loop Until ie.readystate = 4 ' wait until fully loaded
            Set instancehyperlinks = ie.document.getelementsbytagname("A")
            For Each hyper_linkPage In instancehyperlinks
                If hyper_linkPage.Title = "Download" Then
                    hyper_linkPage.Click
                End If
            Next
        End If
    Next

End Sub

2 个答案:

答案 0 :(得分:0)

好吧,我之前遇到过这个问题并已解决,而不必使用任何硒等第三方工具,在得到答案之前,我先给你提个建议:用VBA自动化IE是一个严重的PITA ,如果可能的话,我会考虑其他途径。

免责声明:我从其他来源中发现并修改了很多此类代码,由于许多原因,我现在无法追踪该来源,如果找到它们,我将在以后添加它们。

好吧,首先,您需要找到窗口,创建一个新模块,并将其命名为“ modWindowsAPI”,并将其添加到其中,这将使您的脚本能够钩接到必要的Windows API中,不仅可以找到窗口,还可以将其下载为好吧:

Option Explicit

Public Declare Sub Sleep Lib "kernel32" _
    (ByVal dwMilliseconds As Long)

Public Declare Function FindWindow Lib "user32" Alias "FindWindowA" _
    (ByVal lpClassName As String, ByVal lpWindowName As String) As Long

Public Declare Function FindWindowEx Lib "user32" Alias "FindWindowExA" _
    (ByVal hWnd1 As Long, ByVal hWnd2 As Long, ByVal lpsz1 As String, ByVal lpsz2 As String) As Long

Public Declare Function SetForegroundWindow Lib "user32" _
    (ByVal hWnd As Long) As Long

Public Declare Function SendMessage Lib "user32" Alias "SendMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

Public Declare Function SendMessageByString Lib "user32" Alias "SendMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, ByVal lParam As String) As Long

Public Declare Function PostMessage Lib "user32" Alias "PostMessageA" _
    (ByVal hWnd As Long, ByVal wMsg As Long, ByVal wParam As Long, lParam As Any) As Long

Public Declare Sub keybd_event Lib "user32" _
    (ByVal bVk As Byte, ByVal bScan As Byte, ByVal dwFlags As Long, ByVal dwExtraInfo As Long)

Public Const BM_CLICK = &HF5
Public Const WM_SETTEXT = &HC
Public Const WM_GETTEXT = &HD
Public Const WM_GETTEXTLENGTH = &HE

Public Const VK_KEYDOWN = &H0
Public Const VK_KEYUP = &H2
Public Const VK_CONTROL = &H11

从这里开始,有两种方法可以锁定窗口(据我所知,可能还有更多),可以通过URL或窗口标题来找到它。

按URL查找窗口: 创建函数GetOpenIEByTitle:

Function GetOpenIEByTitle(i_Title As String, Optional ByVal i_ExactMatch As Boolean = True) As SHDocVw.InternetExplorer
    Dim windowMatched As Boolean
    Dim e_title As String
    windowMatched = False
    Dim windowTimeout As Integer
    windowTimeout = 0

    Do Until windowMatched = True Or windowTimeout = 10
        If i_ExactMatch = False Then i_Title = "*" & i_Title & "*"
        'ignore errors when accessing the document property
        On Error Resume Next
        'loop over all Shell-Windows
        For Each GetOpenIEByTitle In objShellWindows
            'if the document is of type HTMLDocument, it is an IE window
            If TypeName(GetOpenIEByTitle.Document) = "HTMLDocument" Then
            'check the title
                If GetOpenIEByTitle.Document.Title Like i_Title Then
                    'leave and set boolean as true, we found the right window
                    windowMatched = True
                    Sleep 600
                    Exit Function
                End If
            End If
        Next
    windowTimeout = windowTimeout + 1
    Loop
End Function

按URL查找窗口: 创建名为GetOpenIEByURL的函数

Function GetOpenIEByURL(ByVal i_URL As String) As SHDocVw.InternetExplorer
    Dim urlMatched As Boolean
    urlMatched = False
    Dim urlTimeout As Integer
    urlTimeout = 0
    Do Until urlMatched = True Or urlTimeout = 30

        Dim objShellWindows As New SHDocVw.ShellWindows

        'ignore errors when accessing the document property
        On Error Resume Next
        'loop over all Shell-Windows
        For Each GetOpenIEByURL In objShellWindows
            'if the document is of type HTMLDocument, it is an IE window
            If TypeName(GetOpenIEByURL.Document) = "HTMLDocument" Then
            'check the URL
                If GetOpenIEByURL.Document.URL = i_URL Then
                    'leave, we found the right window
                    urlMatched = True
                    Exit Function
                End If
            End If
        Next

    urlTimeout = urlTimeout + 1
    Loop
End Function

总结 您处在正确的位置,需要多个IE对象,每个活动窗口都需要一个自己的对象,如果关闭它并转到下一个对象,则可以重用同一对象。

调用上述方法之一,如下所示:

Set ieAppChild = GetOpenIEByTitle("Some Title", False)
Set ieAppChild = GetOpenIEByURL("https://127.0.0.1")

编辑:忘记提及当您准备关闭IE窗口以移至下一个窗口时,请不要忘记调用ieAppChild.Quit,并且不必在重用之前将ie子对象设置为no,但是,这不是最佳实践。

最后是找到下载窗口并单击保存的功能:

Function SaveAs()
    Dim hWnd As Long

    Dim timeout As Date

    'Debug.Print "File_Download_Click_Save"

    'Find the File Download window, waiting a maximum of 30 seconds for it to appear

    timeout = Now + TimeValue("00:00:30")

    Do
        hWnd = FindWindow("#32770", "File Download")
        DoEvents
        Sleep 200
    Loop Until hWnd Or Now > timeout

    'Debug.Print "   File Download window "; Hex(hWnd)
    If hWnd Then
        'Find the child Save button
        hWnd = FindWindowEx(hWnd, 0, "Button", "&Save")
        'Debug.Print "   Save button "; Hex(hWnd)
    End If


    If hWnd Then

        'Click the Save button
        SetForegroundWindow (hWnd)
        Sleep 600  'this sleep is required and 600 miiliseconds seems to be the minimum that works
        SendMessage hWnd, BM_CLICK, 0, 0
    End If
End Function

答案 1 :(得分:0)

感谢user1090660 我的解决方法是将其制作成两步脚本,首先,我将第一页中的所有链接收集到Excel工作表中 关闭互联网浏览器 然后为Excel工作表中的每个链接重新打开Internet Explorer 然后,我进入了正确的页面并可以挖掘数据。 我的方式远不及您的方式优雅或可扩展,但它确实有效, 感谢您的回答。 马丁