如何将链接数组传递给mechanize

时间:2016-01-01 13:39:58

标签: ruby mechanize

我有一系列链接,我有兴趣使用mechanize进行抓取 一切都在以下脚本中工作,但我只是想将每个链接名称加载为数组,然后让mechanize做它的事情。我已经记录了脚本,所以应该是不言自明的。

require 'nokogiri'
require 'open-uri'
require 'mechanize'

agent      = Mechanize.new
#Get the baseline page
agent.get("http://mylink:8080/lablink")
#Get the string for the baseline page to use for later
t="http://mylink:8080"
#Fill out the authentication form
form = agent.page.forms.first
form.j_username = "usr"
form.j_password = "pwd"
form.submit
#Select the project link- level 1

#Create a new array with the text of the projects you are interested in
#Then loop through each project to do what is below:

agent.page.link_with(:text => "TinM_DK").click #I want to have the :text look for an array here
#Select the links that have ViewSample in them- level 2
agent.page.links_with(:href => /ViewSample/).map {|link| link.click
    #Select the links that have DownloadFile in them- level 2
    agent.page.links_with(:text => /[1-2]\.fq/).each do |link|
      #Recreate the full URL
    link=t+link.uri.to_s
  #Make string into a qualified URL
  uri = URI(link)
  puts uri
  Save the correct file with fq.gz
  #Get it to download to which ever folder you want by cd into that folder and then paste the code into irb
  agent.get(link).save
end
}

1 个答案:

答案 0 :(得分:0)

我对ruby和编码很新。我想我已经找到了一种方法,可以将URL从最初的爬网中保存为数组。如果这有用,我可以上传代码吗?

只是看一下调用数组中的url来解析下一个脚本的方法。