Ruby Mechanize在每个Do循环中停止工作

时间:2014-08-19 08:50:38

标签: ruby mechanize mechanize-ruby

我使用机械化Ruby脚本在制表符分隔文件中循环约1,000条记录。一切都按预期工作,直到我达到约300条记录。

一旦我获得了大约300条记录,我的脚本会在每次尝试时都不停地呼叫救援,并最终停止工作。我认为这是因为我没有正确设置max_history,但这似乎并没有产生影响。

以下是我开始收到的错误消息:

getaddrinfo: nodename nor servname provided, or not known

关于我在这里做错了什么的任何想法?

require 'mechanize' 
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size

mechanize = Mechanize.new { |agent|
  agent.open_timeout   = 10
  agent.read_timeout   = 10
  agent.max_history = 0
}

File.open(ARGV[0]).each do |line|
  item = line.split("\t").map {|item| item.strip}
  website = item[16]
  name = item[11]

  if website
    begin
      tries ||= 3
      page = mechanize.get(website)

      primary1 = page.link_with(text: 'text')
      secondary1 = page.link_with(text: 'other_text')
      contains_primary = true
      contains_secondary = true

      unless contains_primary || contains_secondary
        1.times do |count|
          result_counter+=1
          STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
        end
      end

      for i in [primary1]
        if i
          page_to_visit = i.click
          page_found = page_to_visit.uri
          1.times do |count|
            result_counter+=1
            STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
          end
          break
        end
      end
    rescue Timeout::Error
      STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
    rescue => e
      STDERR.puts e.message
      STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
    end
  end
end

1 个答案:

答案 0 :(得分:1)

您收到此错误的原因是您在使用后没有关闭连接。

这可以解决您的问题:

mechanize = Mechanize.new { |agent|
  agent.open_timeout = 10
  agent.read_timeout = 10
  agent.max_history  = 0
  agent.keep_alive   = false
}