我想从网址下载图片,例如:http://trinity.e-stile.ru/并将图片保存到" C:\ pickaxe \ pictures"等目录中。使用Nokogiri非常重要。
我在本网站上阅读了类似的问题,但我没有找到它是如何工作的,我也不了解算法。
我编写了代码,用于解析网址并将部分网页源代码放入" img"标记到链接对象:
require 'nokogiri'
require 'open-uri'
PAGE_URL="http://trinity.e-stile.ru/"
page=Nokogiri::HTML(open(PAGE_URL)) #parsing into object
links=page.css("img") #object with html code with img tag
puts links.length # it is 24 images on this url
puts
links.each{|i| puts i } #it looks like: <img border="0" alt="" src="/images/kroliku.jpg">
puts
puts
links.each{|link| puts link['src'] } #/images/kroliku.jpg
抓取HTML代码后用什么方法保存图片?
如何将图像放入磁盘上的目录?
我更改了代码,但它有一个错误:
/home/action/.parts/packages/ruby2.1/2.1.1/lib/ruby/2.1.0/net/http.rb:879:in `initialize': getaddrinfo: Name or service not known (SocketError)
这是现在的代码:
require 'nokogiri'
require 'open-uri'
require 'net/http'
LOCATION = 'pics'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
#PAGE_URL = "http://ruby.bastardsbook.com/files/hello-webpage.html"
#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"
page=Nokogiri::HTML(open(PAGE_URL))
links=page.css("img")
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*\//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
}
答案 0 :(得分:1)
你差不多完成了。唯一剩下的就是存储文件。我们来做吧。
LOCATION = 'C:\pickaxe\pictures'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
require 'net/http'
.... # your code with nokogiri etc.
links.each{|link|
Net::HTTP.start(PAGE_URL) do |http|
localname = link.gsub /.*\//, '' # left the filename only
resp = http.get link['src']
open("#{LOCATION}/#{localname}", "wb") do |file|
file.write resp.body
end
end
end
就是这样。
答案 1 :(得分:0)
正确的版本:
require 'nokogiri'
require 'open-uri'
LOCATION = 'pics'
if !File.exist? LOCATION # create folder if it is not exist
require 'fileutils'
FileUtils.mkpath LOCATION
end
#PAGE_URL="http://trinity.e-stile.ru/"
PAGE_URL="http://www.youtube.com/"
page=Nokogiri::HTML(open(PAGE_URL))
links=page.css("img")
links.each{|link|
uri = URI.join(PAGE_URL, link['src'] ).to_s # make absolute uri
localname=File.basename(link['src'])
File.open("#{LOCATION}/#{localname}",'wb') { |f| f.write(open(uri).read) }
}