获得机械化来浏览x个链接并获得所有标题?

时间:2014-05-19 07:40:23

标签: ruby mechanize

基本上我想使用mechanize来浏览本网站上a-z的所有页面 http://www.tv.com/shows/sort/a_z/

然后,对于每个字母,在字母的所有页面上获得每个节目的标题" a"。目前我正试图让它与#34; a"一起工作。这是我到目前为止所做的,但不知道从哪里去?

require 'mechanize'

agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click

1 个答案:

答案 0 :(得分:1)

您只需使用一些XPath来查找所需内容并导航。

require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
  agent.get letter_link[:href]
  agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }

  while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
    agent.get next_page_link[:href]
    agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
  end
end

require 'pp'
pp shows