Nokogiri / Mechanize刮刀将表格值传递给刮刀类

时间:2014-06-04 11:44:17

标签: ruby-on-rails nokogiri mechanize

作为一项学习练习,我构建了一个刮刀来获取Reddit的头条新闻。可以找到当前实时示例here,其中尚未包含任何用户定义的功能。我想添加一项功能,允许用户通过表单输入输入他们选择的首选subreddit。如何将subreddit附加到我的RedditScraper类中指定的url?例如,我希望基本网址是" http:reddit.com/r /"然后用户可以输入" ruby​​"或者他们喜欢的任何subreddit。这是我的刮刀类:

require 'nokogiri'
require 'open-uri'
require 'mechanize'


class RedditScraper

  def initialize
    @headline = []
    @agent = Mechanize.new
  end

  def fetch_reddit_headlines
    @url = 'http://www.reddit.com/r/'
    mech_page = @agent.get(@url)

    num_pages_to_scrape = 2
    count = 0

    while(num_pages_to_scrape > count)
      page = mech_page.parser

      page.css('a.title').each do |link|
        if link['href'].include?('http')
          @headline << { content: link.content, href: link['href'] }
        else
          @headline << { content: link.content, href: "http://reddit.com" + link['href'] }
        end
      end
      @headline

      count += 1
      mech_page = @agent.get(page.css('.nextprev').css('a').last.attributes["href"].value)
    end

    return @headline
  end
end

这是我的控制器动作:

def index
    @fetch_reddit = RedditScraper.new.fetch_reddit_headlines
end 

我的表格部分(我按照示例here):

<%= form_tag("/search", method: "get") do %>
<%= label_tag(:q, "Enter a Subreddit (example: ruby):") %>
<%= text_field_tag(:q) %>
<p><%= submit_tag("Retrieve") %></p>
<% end %>

更新:尝试下面的建议,但现在收到错误: enter image description here

1 个答案:

答案 0 :(得分:0)

在您的控制器中

def index
  @fetch_reddit = RedditScraper.new.fetch_reddit_headlines(params[:q])
end

在RedditScraper中

def fetch_reddit_headlines(subreddit = 'ruby')
  subreddit = 'ruby' if subreddit.nil?
  @url = "http://www.reddit.com/r/#{subreddit}"

  mech_page = @agent.get(@url)
  (...)

以您的形式

  <%= form_tag("/", method: "get") do %>
  (...)

如果您想使用&#39; / search&#39;在您的表单中,您必须在routes.rb中注册路径

get  'search'  => 'controller#action'