VBA [EXCEL 2013]:自动化:在相同浏览器窗口/选项卡中打开链接

时间:2015-12-09 01:29:26

标签: excel vba parsing web-scraping internet-explorer-11

我想用VBA(Excel)解析一长串本地网页(.HTM文件)并将一些数据提取到excel中。该计划需要抓取超过9000个网页。这是一个例子:

> C:\Users\User_ID\Webpages\BS_1000.HTM.htm
C:\Users\User_ID\Webpages\BS_1001.HTM.htm
C:\Users\User_ID\Webpages\BS_1002.HTM.htm
C:\Users\User_ID\Webpages\BS_1003.HTM.htm
C:\Users\User_ID\Webpages\BS_1006.HTM.htm
C:\Users\User_ID\Webpages\BS_1007.HTM.htm
C:\Users\User_ID\Webpages\BS_1011.HTM.htm
C:\Users\User_ID\Webpages\BS_1012.HTM.htm
C:\Users\User_ID\Webpages\BS_1015.HTM.htm
C:\Users\User_ID\Webpages\BS_1016.HTM.htm
[... and the list goes on ...]

这是VBA:

<!-- language: lang-HTML -->
For startNumber = 1 To TotalProfiles
Dim ie As InternetExplorerMedium
Set ie = New InternetExplorerMedium
ie.Visible = True

Application.StatusBar = "Loading profile " & ProfileNumber & " from a total of " & TotalProfiles & " profiles"
Set currentProfile = Worksheets("List_of_Files").Range("B" & CurrentRowPosition)
ie.navigate currentProfile

Application.StatusBar = "Loading profile: " & ProfileNumber & "; file location: " & currentProfile
Do While ie.READYSTATE <> READYSTATE_COMPLET
DoEvents
Loop

Application.StatusBar = "Storing " & currentProfile & " information into HTMLElement"
Set html = ie.document
Set ie = Nothing
[some code here...]

问题是我当前的代码在新的IE窗口中打开每个页面(不关闭前一个)。有超过9000个网页需要抓取,这很快就会成为一个非常大的问题。

我在Microsoft Office 2013中使用Internet Explorer 11(在Windows 7 Enterprise SP1上)。

我想要的是IE应该在同一个标​​签页面中打开每个网页(在完成解析并加载下一页后,或者至少关闭窗口时,几乎只需刷新已经使用过的#34;正在使用&#34;标签完成解析后,在&#34; new&#34;窗口中打开下一个网页。可悲的是,到目前为止,我还没有找到解决办法。任何帮助将不胜感激。

1 个答案:

答案 0 :(得分:2)

每次打开一个新窗口的原因是您在循环开始时使用此行Set ie = New InternetExplorerMedium

告诉它

有两种方法可以解决它。

  1. 在循环之前启动IE,然后在循环完成后退出IE:
  2. 像这样:

    Dim ie As InternetExplorerMedium
    Set ie = New InternetExplorerMedium
    ie.Visible = True
    
    For startNumber = 1 To TotalProfiles
    
         Application.StatusBar = "Loading profile: " & ProfileNumber & "; file location: " & currentProfile
         Do While ie.READYSTATE <> READYSTATE_COMPLET
              DoEvents
         Loop
    
         Set currentProfile = Worksheets("List_of_Files").Range("B" & CurrentRowPosition)
         ie.navigate currentProfile
    
         Application.StatusBar = "Storing " & currentProfile & " information into HTMLElement"
    
         Set html = ie.document
    
         [some code here...]
    
    Next
    
    Set html = Nothing
    ie.Quit
    Set ie = Nothing
    
    1. 每次在结束循环之前退出IE实例(可能没有第一种方式那么高效)
    2. 像这样:

      For startNumber = 1 To TotalProfiles
      
           Dim ie As InternetExplorerMedium
           Set ie = New InternetExplorerMedium
           ie.Visible = True
      
           Application.StatusBar = "Loading profile " & ProfileNumber & " from a total of " & TotalProfiles & " profiles"
           Set currentProfile = Worksheets("List_of_Files").Range("B" & CurrentRowPosition)
           ie.navigate currentProfile
      
           Application.StatusBar = "Loading profile: " & ProfileNumber & "; file location: " & currentProfile
           Do While ie.READYSTATE <> READYSTATE_COMPLET
                DoEvents
           Loop
      
           Application.StatusBar = "Storing " & currentProfile & " information into HTMLElement"
           Set html = ie.document
      
           [some code here...]
      
           Set html = Nothing
           ie.Quit
           Set ie = Nothing
      
      Next