我正在使用Ruby gem mechanize来抓取一些HTML ...当我加载我的页面并显示必要的结果时,该页面很好。重新加载后,在执行“search_results = @ agent.submit(search_form)”时出现此错误:
undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem
在我发布任何代码之前,这只会响铃吗?
感谢。
代码:
start = Time.now
# initial set up
@agent = Mechanize.new
Mechanize.html_parser = Hpricot
page = @agent.get("http://www.google.com/")
search_form = page.forms.first
# conduct initial search
@search_term = search_form.q = params[:search].to_s
search_results = @agent.submit(search_form)
# helper variables
search_qs = ""; @page_number = 1; i = 0; @flag = false;
# get the query string structure
search_results.links.each { |li| search_qs = li.href if li.href.match(/.*search\?q=.*start=.*/) }
# search through all paginated pages
while (i < 500)
search_qs = search_qs.gsub(/start=\d+/,"start=#{i}")
@search_url = "http://google.com#{search_qs}"
search_results = @agent.get(@search_url)
search_results.links.each { |li| @flag = true if li.text.match("All Bout Texas Tailgating") }
break if @flag
i+=10; @page_number+=1
end
@execution_time = Time.now-start
render :layout => false
查看:
<h2>Query results for "<%= @search_term %>" on Google</h2>
<% if @flag %>
<p>What page is this keyword found: <b><%= @page_number %></b></p>
<p><%= link_to "Click to see page", "#{@search_url}", {:target => "_blank"} %></p>
<p>How long did this query take to run?: <%= @execution_time %> seconds</p>
<% else %>
<p>Keyword not found in Google search reults</p>
<% end %>
STACK TRACE:
NoMethodError (undefined method `<=>' for {emptyelem <input name="hl" value="en" type="hidden">}:Hpricot::Elem):
mechanize (1.0.0) lib/mechanize/form/field.rb:30:in `<=>'
mechanize (1.0.0) lib/mechanize/form.rb:171:in `sort'
mechanize (1.0.0) lib/mechanize/form.rb:171:in `build_query'
mechanize (1.0.0) lib/mechanize.rb:373:in `submit'
app/controllers/admin/importer_controller.rb:24:in `check_page_rank'
/opt/local/lib/ruby/1.8/webrick/httpserver.rb:104:in `service'
/opt/local/lib/ruby/1.8/webrick/httpserver.rb:65:in `run'
/opt/local/lib/ruby/1.8/webrick/server.rb:173:in `start_thread'
/opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start'
/opt/local/lib/ruby/1.8/webrick/server.rb:162:in `start_thread'
/opt/local/lib/ruby/1.8/webrick/server.rb:95:in `start'
/opt/local/lib/ruby/1.8/webrick/server.rb:92:in `each'
/opt/local/lib/ruby/1.8/webrick/server.rb:92:in `start'
/opt/local/lib/ruby/1.8/webrick/server.rb:23:in `start'
/opt/local/lib/ruby/1.8/webrick/server.rb:82:in `start'
Rendered rescues/_trace (98.4ms)
Rendered rescues/_request_and_response (1.2ms)
Rendering rescues/layout (internal_server_error)
答案 0 :(得分:0)
因此,如果您查看form.rb中的source for mechanize - 表单提交正在调用一个名为build_query的函数,该函数对表单上的字段进行排序。由于sort使用&lt; =&gt;运算符,并且在Hpricot元素上未定义,您将获得异常。
似乎机械化是为了使用Nokogiri构建的 - 它可能与其他解析实现有不一致的错误。我没有深入到机械化的来源,也不想责怪任何人,但你可能想尝试切换到Nokogiri这个项目(如果可能的话)。从这个片段看起来好像你在很大程度上依赖于Hpricot。对我来说,机械化在Hpricot的隐藏表单字段上抛出异常似乎很奇怪,但堆栈跟踪在这方面非常清楚。
你的另一个主要选择是跳进机械化源,看看你是否可以自己修复它(或者在机械化github上提交一个bug并希望有人得到它。)
祝你好运。