我使用机械化Ruby脚本在制表符分隔文件中循环约1,000条记录。一切都按预期工作,直到我达到约300条记录。
一旦我获得了大约300条记录,我的脚本会在每次尝试时都不停地呼叫救援,并最终停止工作。我认为这是因为我没有正确设置max_history
,但这似乎并没有产生影响。
以下是我开始收到的错误消息:
getaddrinfo: nodename nor servname provided, or not known
关于我在这里做错了什么的任何想法?
require 'mechanize'
result_counter = 0
used_file = File.open(ARGV[0])
total_rows = used_file.readlines.size
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
}
File.open(ARGV[0]).each do |line|
item = line.split("\t").map {|item| item.strip}
website = item[16]
name = item[11]
if website
begin
tries ||= 3
page = mechanize.get(website)
primary1 = page.link_with(text: 'text')
secondary1 = page.link_with(text: 'other_text')
contains_primary = true
contains_secondary = true
unless contains_primary || contains_secondary
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - No"
end
end
for i in [primary1]
if i
page_to_visit = i.click
page_found = page_to_visit.uri
1.times do |count|
result_counter+=1
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name}"
end
break
end
end
rescue Timeout::Error
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Timeout"
rescue => e
STDERR.puts e.message
STDERR.puts "Generate (#{result_counter}/#{total_rows}) #{name} - Rescue"
end
end
end
答案 0 :(得分:1)
您收到此错误的原因是您在使用后没有关闭连接。
这可以解决您的问题:
mechanize = Mechanize.new { |agent|
agent.open_timeout = 10
agent.read_timeout = 10
agent.max_history = 0
agent.keep_alive = false
}