在Rails 4中使用capybara-webkit gem时,为什么我会偶尔断开连接?

时间:2016-09-26 13:20:29

标签: ruby-on-rails ruby-on-rails-4 webkit capybara capybara-webkit

我使用capybara-webkit gem来抓取我的Rails应用程序中某些页面的数据。我注意到,似乎是“随机”/“偶发”,应用程序将崩溃并出现以下错误:

Capybara::Webkit::ConnectionError: /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/bin/webkit_server failed to start.
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:56:in `parse_port'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:42:in `discover_port'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/server.rb:26:in `start'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:67:in `start_server'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/connection.rb:17:in `initialize'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `new'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit/driver.rb:16:in `initialize'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `new'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-webkit-1.11.1/lib/capybara/webkit.rb:15:in `block in <top (required)>'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:85:in `driver'
    from /home/daveomcd/.rvm/gems/ruby-2.3.1/gems/capybara-2.7.1/lib/capybara/session.rb:233:in `visit'

即使已经连接并多次访问过网站,它也会发生。这是我目前正在使用的代码片段......

if site.url.present?

  begin
    # Visit the URL
    session = Capybara::Session.new(:webkit)
    session.visit(site.url)  # here is where the error occurs...
    document = Nokogiri::HTML.parse(session.body)

    # Load configuration options for Development Group
    roster_table_selector = site.development_group.table_selector
    header_row_selector   = site.development_group.table_header_selector
    row_selector   = site.development_group.table_row_selector
    row_offset     = site.development_group.table_row_selector_offset
    header_format_type    = site.config_header_format_type

    # Get the Table and Header Row for processing
    roster_table        = document.css(roster_table_selector)
    header_row          = roster_table.css(header_row_selector)
    header_hash         = retrieve_headers(header_row, header_format_type)

    my_object = process_rows(roster_table, header_hash, site, row_selector, row_offset)

  rescue ::Capybara::Webkit::ConnectionError => e
    raise e

  rescue OpenURI::HTTPError => e
    if e.message == '404 Not Found'
      raise "404 Page not found..."
    else
      raise e
    end
  end
end

我甚至想过,或许我不知道为什么会发生这种情况 - 但只是恢复它的确如此。因此我将在救援块中对错误进行“重试”,但看起来服务器刚刚关闭 - 所以我在重试时得到相同的结果。也许有人知道我可以检查服务器是否已关闭并重新启动然后执行重试的方法?谢谢你的帮助!

1 个答案:

答案 0 :(得分:0)

因此,在进一步调查之后,似乎我为循环的每次迭代生成了一个新的Capybara::Session。我将它移到了循环之外,并在循环结束时添加了Capybara.reset_sessions!。不确定这对任何事情是否有帮助 - 但问题似乎已得到解决。我会在接下来的一个小时左右监控它。下面是我的ActiveJob代码示例...

class ScrapeJob < ActiveJob::Base
  queue_as :default
  include Capybara::DSL

  def perform(*args)

    session = Capybara::Session.new(:webkit)

    Site.where(config_enabled: 1).order(:code).each do |site|
      process_roster(site, session)
      Capybara.reset_sessions!
    end

  end

  def process_roster(site, session)

    if site.roster_url.present?

      begin
        # Visit the Roster URL 
        session.visit(site.roster_url)
        document = Nokogiri::HTML.parse(session.body)

        # processing code...

        # pass the session that was created as the final parameter..
        my_object = process_rows( ..., session)

      rescue ::Capybara::Webkit::ConnectionError => e
        raise e

      rescue OpenURI::HTTPError => e
        if e.message == '404 Not Found'
          raise "404 Page not found..."
        else
          raise e
        end
      end
    end
  end
end