使用watir-webdriver循环访问一系列URL时,跳过慢速网站

时间:2012-08-02 04:15:45

标签: ruby google-chrome selenium watir watir-webdriver

我正在尝试使用watir-webdriver遍历Chrome中的一系列网站,但我总是在某些网站上遇到错误。最近,我遇到了http://adage.com这个问题。循环将完美执行,直到达到http://adage.com,然后它将一直挂起,直到显示以下错误:

/Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:146:in `rescue in rbuf_fill': Timeout::Error (Timeout::Error)
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:140:in `rbuf_fill'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:122:in `readuntil'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/protocol.rb:132:in `readline'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:2562:in `read_status_line'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:2551:in `read_new'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1319:in `block in transport_request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1316:in `catch'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1316:in `transport_request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1293:in `request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1286:in `block in request'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:745:in `start'
from /Users/default/.rvm/rubies/ruby-1.9.3-p125/lib/ruby/1.9.1/net/http.rb:1284:in `request'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/default.rb:82:in `response_for'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/default.rb:38:in `request'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/http/common.rb:40:in `call'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:598:in `raw_execute'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:576:in `execute'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/remote/bridge.rb:536:in `getActiveElement'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/selenium-webdriver-2.25.0/lib/selenium/webdriver/common/target_locator.rb:60:in `active_element'
from /Users/default/.rvm/gems/ruby-1.9.3-p125/gems/watir-webdriver-0.6.1/lib/watir-webdriver/browser.rb:136:in `send_keys'
from /Users/default/Dropbox/beta_scripts/loop_test.rb:16:in `rescue in <main>'
from /Users/default/Dropbox/beta_scripts/loop_test.rb:11:in `<main>'

我不知道如何避免这种情况。我尝试过设置超时,甚至在救援期间发送ESC密钥以阻止Chrome加载页面,但没有取得任何成功。最终,我希望能够可靠地连续加载500多个网站的数组,但这似乎是不可能的,因为其中一个网站可能会挂起。 有没有办法阻止慢速页面加载并转移到数组中的下一个元素?

以下是我的代码的缩短版本,可以隔离问题:

#!/usr/bin/env ruby

require 'watir-webdriver'

b = Watir::Browser.new :chrome

sites = ["twitter.com", "cars.com", "autotrader.com", "rolex.com", "newyorker.com", "adage.com", "theatlantic.com", "pcmag.com"]

sites.each do |uri|
  begin
    Timeout::timeout(10) do
      b.goto uri
    end
  rescue Timeout::Error => e_time
    sleep 5
    b.send_keys :escape
    p "#{uri} is taking forever to load (#{e_time})"
  rescue Exception => e_exception
    p e_exception
  end
end

b.close

1 个答案:

答案 0 :(得分:0)

我能理解你的挫败感,因为我在处理selenium webdriver时遇到了同样的问题。在这里,您需要做的是100%确保您的脚本在您的500多个网站上运行完美且稳健。

    sites.each do |uri|    
!30.times { if ((b.goto uri)rescue false)then break else sleep 1; end }     
    end    

上面的代码将尝试访问每个网站最多30秒,然后转到下一个网站。