在调用searchEmails(页面)之后,不会在Harvest方法中执行代码(将“嘿”)。我可能错过了一些简单的Ruby,因为我只是回到它。
def searchEmails(page_to_search)
begin
html = @agent.get(url).search('html').to_s
mail = html.scan(/['.'\w|-]*@+[a-z]+[.]+\w{2,}/).map.to_a
base = page_to_search.uri.to_s.split("//", 2).last.split("/", 2).first
mail.each{|e| @file.puts e+";"+base unless e.include? "example.com" or e.include? "email.com" or e.include? "domain.com" or e.include? "company.com" or e.length < 9 or e[0] == "@"}
end
end
def harvest(url)
begin
page = @agent.get(url)
searchEmails(page)
puts "hey"
rescue Exception
end
end
url="www.example.com"
harvest(url)
答案 0 :(得分:3)
@agent.get(url)
将因网址中断或网络中断而失败。
您的代码中的问题可以写成如下:
def do_something
begin
raise
puts "I will never get here!"
rescue
end
end
由于你无法摆脱raise
,你需要在rescue
内做一些事情(最有可能记录下来):
begin
@agent.get(url)
# ...
rescue Timeout::Error, Errno::EINVAL, Errno::ECONNRESET, EOFError,
Net::HTTPBadResponse, Net::HTTPHeaderSyntaxError,
Net::ProtocolError => e
log(e.message, e.callback)
end