我使用capybara-webkit
gem来抓取我的Rails应用程序中某些页面的数据。我注意到,似乎是“随机”/“偶发”,应用程序将崩溃并出现以下错误:
Capybara::Webkit::ConnectionError: /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/bin/webkit_server failed to start.
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:56:in `parse_port'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:42:in `discover_port'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:26:in `start'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:67:in `start_server'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:17:in `initialize'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `new'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `initialize'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `new'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `block in <top (required)>'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:85:in `driver'
from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:233:in `visit'
即使已经连接并多次访问过网站,它也会发生。这是我目前正在使用的代码片段......
if site.url.present?
begin
# Visit the URL
session = Capybara::Session.new(:webkit)
session.visit(site.url) # here is where the error occurs...
document = Nokogiri::HTML.parse(session.body)
# Load configuration options for Development Group
roster_table_selector = site.development_group.table_selector
header_row_selector = site.development_group.table_header_selector
row_selector = site.development_group.table_row_selector
row_offset = site.development_group.table_row_selector_offset
header_format_type = site.config_header_format_type
# Get the Table and Header Row for processing
roster_table = document.css(roster_table_selector)
header_row = roster_table.css(header_row_selector)
header_hash = retrieve_headers(header_row, header_format_type)
my_object = process_rows(roster_table, header_hash, site, row_selector, row_offset)
rescue ::Capybara::Webkit::ConnectionError => e
raise e
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
raise "404 Page not found..."
else
raise e
end
end
end
我甚至想过,或许我不知道为什么会发生这种情况 - 但只是恢复它的确如此。因此我将在救援块中对错误进行“重试”,但看起来服务器刚刚关闭 - 所以我在重试时得到相同的结果。也许有人知道我可以检查服务器是否已关闭并重新启动然后执行重试的方法?谢谢你的帮助!
答案 0 :(得分:0)
因此,在进一步调查之后,似乎我为循环的每次迭代生成了一个新的Capybara::Session
。我将它移到了循环之外,并在循环结束时添加了Capybara.reset_sessions!
。不确定这对任何事情是否有帮助 - 但问题似乎已得到解决。我会在接下来的一个小时左右监控它。下面是我的ActiveJob代码示例...
class ScrapeJob < ActiveJob::Base
queue_as :default
include Capybara::DSL
def perform(*args)
session = Capybara::Session.new(:webkit)
Site.where(config_enabled: 1).order(:code).each do |site|
process_roster(site, session)
Capybara.reset_sessions!
end
end
def process_roster(site, session)
if site.roster_url.present?
begin
# Visit the Roster URL
session.visit(site.roster_url)
document = Nokogiri::HTML.parse(session.body)
# processing code...
# pass the session that was created as the final parameter..
my_object = process_rows( ..., session)
rescue ::Capybara::Webkit::ConnectionError => e
raise e
rescue OpenURI::HTTPError => e
if e.message == '404 Not Found'
raise "404 Page not found..."
else
raise e
end
end
end
end
end