基本上我想使用mechanize来浏览本网站上a-z的所有页面 http://www.tv.com/shows/sort/a_z/
然后,对于每个字母,在字母的所有页面上获得每个节目的标题" a"。目前我正试图让它与#34; a"一起工作。这是我到目前为止所做的,但不知道从哪里去?
require 'mechanize'
agent=Mechanize.new
goog = agent.get "http://www.tv.com/shows/sort/a_z/"
search = goog.link_with(:href => "/shows/sort/a/").click
答案 0 :(得分:1)
您只需使用一些XPath来查找所需内容并导航。
require 'mechanize'
shows = Array.new
agent = Mechanize.new
agent.get 'http://www.tv.com/shows/sort/a_z/'
agent.page.search('//div[@class="alphabet"]//li[not(contains(@class, "selected"))]/a').each do |letter_link|
agent.get letter_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
while next_page_link = agent.page.at('//div[@class="_pagination"]//a[@class="next"]') do
agent.get next_page_link[:href]
agent.page.search('//li[@class="show"]/a').each { |show_link| shows << show_link.text }
end
end
require 'pp'
pp shows