使用带有正则表达式的mechanize来选择要跟随的链接中的特定文本

时间:2015-12-30 22:16:49

标签: ruby regex nokogiri mechanize

我有一个页面如下:

#<Mechanize::Page::Link
   "TCO11_IIIE"
   "/me/secure/ViewSample.do?id=211112">
  #<Mechanize::Page::Link
   "TCO15_IIIE"
   "/me/secure/do?id=211113">
  #<Mechanize::Page::Link
   "TCO16_IIC"
   "/me/secure/ViewSample.do?id=211114">
  #<Mechanize::Page::Link
   "TCO17_IIC"
   "/me/secure/ViewSample.do?id=211116">
  #<Mechanize::Page::Link
   "TCO17_IIIE"
   "/me/secure/ViewSample.do?id=211115">
  #<Mechanize::Page::Link
   "TCO19_IID"
   "/me/secure/ViewSample.do?id=211117">
  #<Mechanize::Page::Link
   "TCO21_IIC"
   "/me/secure/ViewSample.do?id=211118">
  #<Mechanize::Page::Link
   "TCO21_IIIE"
   "/me/secure/do?id=211119">
  #<Mechanize::Page::Link
   "TCO23_IIC"
   "/me/secure/do?id=211120">

我正在编写一个脚本,试图按照其中包含“ViewSample”的链接(然后下载以fq结尾但与此问题无关的特定链接)。

我对如何执行此操作感到有点困惑,因为我认为方法.search.links_with需要整个链接文本的精确字符串(?或者是href ???)。所以我想我需要在下面代码的第一行中使用正则表达式:

master_page.search("ViewSample") do |download_list_link|
    download_list_page = agent.get(download_list_link[:href])

    download_list_page.search("td > a") do |link|
        if link.content.include?("fq.gz")
            out_file = File.new("downloaded_file", "w")
            out_file.puts($agent.get_file(link[:href]))
            out_file.close
        end
    end
end

1 个答案:

答案 0 :(得分:2)

那是select的用途:

page.links.select{|link| link.href[/ViewSample/]}

page.search('a').select{|a| a[:href][/ViewSample/]}