我有一个包含许多zip文件的网址,我需要下载这些文件的本地副本。我到目前为止:
require 'open-uri'
require 'pry'
def download_xml(url, dest)
open(url) do |u|
File.open(dest, 'wb') { |f| f.write(u.read) }
end
end
urls = ["http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/"]
urls.each { |url| download_xml(url, url.split('/').last) }
但是,我似乎无法访问位于该位置的zip文件或循环访问它们。我如何遍历该URL末尾的每个zip文件,以便可以在该数组中访问它们并通过该方法下载?
答案 0 :(得分:1)
我使用Nokogiri gem解析HTML,所以首先安装Nokogiri gem:
sudo apt-get install build-essential patch
sudo apt-get install ruby-dev zlib1g-dev liblzma-dev
sudo gem install nokogiri
特定于您的问题的解决方案:
<强> noko.rb
require 'rubygems'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/")) # Open web address with Nokogiri
puts page.class # => Nokogiri::HTML::Documents
for file_link in page.css('a') # For each a HTML tag / link
if file_link.text[-4,4] != ".zip" # If it's not a zip file
next # Continue the loop
end
link = "http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/" + file_link.text # Generate the zip file's link
puts link
open(file_link.text, 'wb') do |file|
file << open(link).read # Save the zip file to this directory
end
puts file_link.text + " has been downloaded."
end
我已经用注释解释了代码。
最终,除了解析HTML文件并逐个生成下载链接并在最后下载之外别无选择。