我有一个网页上有很多链接。单击时,每个链接都允许用户下载文件。我只想要href中具有特定术语的链接。我不知道如何遍历所有链接并保存所有文件。到目前为止,我一直在使用mechanize来生成代码
代码是:
agent.page.links_with(:href => /DownloadFile/).each do |link|
#How do I save the file from the link here
end
使用更新的代码
agent.page.links_with(:href => /DownloadFile/).each do |link|
File.open("download.txt", "w") do |f|
uri = URI(link)
f << Net::HTTP.get(uri)
end
end
再次使用更新的代码,但没有下载
require 'nokogiri'
require 'open-uri'
require 'mechanize'
agent = Mechanize.new
agent.get("http://mylink")
form = agent.page.forms.first
form.j_username = "usr"
form.j_password = "pwd"
form.submit
#Pick the project you want to download and open it
agent.page.link_with(:text => "AneupGastricCaFSeq").click
agent.page.links_with(:href => /ViewSample/).map {|link|
link.click
agent.page.links_with(:href => /DownloadFile/).each do |link|
link=t+link.uri.to_s
uri = URI(link)
File.open("downloaded_file", "w+") do |f|
f << Net::HTTP.get(uri)
end
end
}
答案 0 :(得分:1)
您需要代理下载该文件。因为您使用代理登录了您的网站,然后代理会保留您的会话cookie。如果您使用Net::HTTP
,那么您没有会话cookie。
您需要替换此代码段:
uri = URI(link)
File.open("downloaded_file", "w+") do |f|
f << Net::HTTP.get(uri)
end
用这个:
agent.get(uri).save!('downloaded_file')
你需要添加一个pluggable_parser来直接保存文件而不保留它的内存:
agent.pluggable_parser.default = Mechanize::Download
完整的代码是:
require 'nokogiri'
require 'open-uri'
require 'mechanize'
dir = 'your/path/to/save'
agent = Mechanize.new
agent.pluggable_parser.default = Mechanize::Download
agent.get("http://mylink")
form = agent.page.forms.first
form.j_username = "usr"
form.j_password = "pwd"
form.submit
#Pick the project you want to download and open it
agent.page.link_with(:text => "AneupGastricCaFSeq").click
agent.page.links_with(:href => /ViewSample/).map {|link|
link.click
agent.page.links_with(:href => /DownloadFile/).each do |link|
link=t+link.uri.to_s
uri = URI(link)
#your custom function to generate a distinct filename
filename = filename_from_link(link)
agent.get(uri).save!(File.join(dir, filename))
end
}
答案 1 :(得分:0)
您需要使用Net::HTTP
方法使用GET
请求读取文件,并在本地保存文件。
示例程序如下所示:
require 'net/http'
link = 'http://stackoverflow.com'
File.open("download.txt", "w") do |f|
uri = URI(link)
f << Net::HTTP.get(uri)
end