此代码随机抓取谷歌图片中的图像。但是,当webcrawler试图搜索谷歌没有显示结果的术语时,我会收到错误。当谷歌给不再存在的图像的webcrawler时,我也会收到错误。我怎么能写这个代码,以便如果它遇到错误,它将重新运行并尝试获取另一个图像。
require 'open-uri'
require 'nokogiri'
url = "https://www.google.com/search?hl=en&q=" + rand(0-999999).to_s + "&ion=1&bav=on.2,or.r_gc.r_pw.r_cp.r_qf.&bvm=bv.42553238,d.dmg&biw=1354&bih=622&um=1&ie=UTF-8&tbm=isch&source=og&sa=N&tab=wi&ei=sNEfUf-fHvLx0wG7uoG4DQ"
googim = Nokogiri::HTML(open(url))
googimstr = googim.to_s
durl = googim.to_s.split('imgurl=')[1].split('&')[0]
name = durl.reverse.split("/")[0].reverse
open("./data/images/#{name}", 'wb') do |file|
file << open(durl).read
end
以下是我收到的两种错误
第一个错误:
usr/lib/ruby/2.0.0/open-uri.rb:353:in `open_http': 400 Bad Request (OpenURI::HTTPError)
from /usr/lib/ruby/2.0.0/open-uri.rb:708:in `buffer_open'
from /usr/lib/ruby/2.0.0/open-uri.rb:210:in `block in open_loop'
from /usr/lib/ruby/2.0.0/open-uri.rb:208:in `catch'
from /usr/lib/ruby/2.0.0/open-uri.rb:208:in `open_loop'
from /usr/lib/ruby/2.0.0/open-uri.rb:149:in `open_uri'
from /usr/lib/ruby/2.0.0/open-uri.rb:688:in `open'
from /usr/lib/ruby/2.0.0/open-uri.rb:34:in `open'
from wc.rb:11:in `block in <main>'
from /usr/lib/ruby/2.0.0/open-uri.rb:36:in `open'
from /usr/lib/ruby/2.0.0/open-uri.rb:36:in `open'
from wc.rb:10:in `<main>'
第二个错误:
wc.rb:6:in `split': invalid byte sequence in UTF-8 (ArgumentError)
from wc.rb:6:in `<main>'
答案 0 :(得分:2)
您可以将代码的相应部分包装在begin/end
块中,并rescue
例外。例如:
begin
open("./data/images/#{name}", 'wb') do |file|
file << open(durl).read
end
rescue => e
puts "some failure: #{e}"
end
以下是Pickaxe / Programming Ruby的Exceptions,Catch和Throw章节的链接:http://www.ruby-doc.org/docs/ProgrammingRuby/html/tut_exceptions.html