使用Ruby从URL下载文件

时间:2016-09-23 03:03:33

标签: ruby download uri net-http

我有一个包含许多zip文件的网址,我需要下载这些文件的本地副本。我到目前为止:

require 'open-uri'
require 'pry'

def download_xml(url, dest)
  open(url) do |u|
    File.open(dest, 'wb') { |f| f.write(u.read) }
  end
end

urls = ["http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/"]

urls.each { |url| download_xml(url, url.split('/').last) }

但是,我似乎无法访问位于该位置的zip文件或循环访问它们。我如何遍历该URL末尾的每个zip文件,以便可以在该数组中访问它们并通过该方法下载?

1 个答案:

答案 0 :(得分:1)

我使用Nokogiri gem解析HTML,所以首先安装Nokogiri gem:

sudo apt-get install build-essential patch
sudo apt-get install ruby-dev zlib1g-dev liblzma-dev
sudo gem install nokogiri

特定于您的问题的解决方案:

<强> noko.rb

require 'rubygems'
require 'nokogiri'
require 'open-uri'

page = Nokogiri::HTML(open("http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/")) # Open web address with Nokogiri
puts page.class   # => Nokogiri::HTML::Documents

for file_link in page.css('a') # For each a HTML tag / link
  if file_link.text[-4,4] != ".zip" # If it's not a zip file
    next # Continue the loop
  end
  link = "http://feed.omgili.com/5Rh5AMTrc4Pv/mainstream/posts/" + file_link.text # Generate the zip file's  link
  puts link
  open(file_link.text, 'wb') do |file|
    file << open(link).read # Save the zip file to this directory
  end
  puts file_link.text + " has been downloaded."
end

我已经用注释解释了代码。

最终,除了解析HTML文件并逐个生成下载链接并在最后下载之外别无选择。