我有一系列链接,我有兴趣使用mechanize进行抓取 一切都在以下脚本中工作,但我只是想将每个链接名称加载为数组,然后让mechanize做它的事情。我已经记录了脚本,所以应该是不言自明的。
require 'nokogiri'
require 'open-uri'
require 'mechanize'
agent = Mechanize.new
#Get the baseline page
agent.get("http://mylink:8080/lablink")
#Get the string for the baseline page to use for later
t="http://mylink:8080"
#Fill out the authentication form
form = agent.page.forms.first
form.j_username = "usr"
form.j_password = "pwd"
form.submit
#Select the project link- level 1
#Create a new array with the text of the projects you are interested in
#Then loop through each project to do what is below:
agent.page.link_with(:text => "TinM_DK").click #I want to have the :text look for an array here
#Select the links that have ViewSample in them- level 2
agent.page.links_with(:href => /ViewSample/).map {|link| link.click
#Select the links that have DownloadFile in them- level 2
agent.page.links_with(:text => /[1-2]\.fq/).each do |link|
#Recreate the full URL
link=t+link.uri.to_s
#Make string into a qualified URL
uri = URI(link)
puts uri
Save the correct file with fq.gz
#Get it to download to which ever folder you want by cd into that folder and then paste the code into irb
agent.get(link).save
end
}
答案 0 :(得分:0)
我对ruby和编码很新。我想我已经找到了一种方法,可以将URL从最初的爬网中保存为数组。如果这有用,我可以上传代码吗?
只是看一下调用数组中的url来解析下一个脚本的方法。