Ruby Mechanize表单输入字段文本

时间:2015-05-14 18:58:32

标签: ruby csv automation web-scraping mechanize

已解决 - " abc = list.scan(/ [([^]] +)] /)。last.first"行是正确的,但也包括引号,网站搜索表格不接受。将其更正为abc = list.scan(/ \"([^]] +)\" /)。join。

感谢您的帮助。

我必须使用csv文件中的100个关键字列表自动执行搜索。

使用Mechanize,我可以使用此示例(http://mechanize.rubyforge.org/GUIDE_rdoc.html)提交搜索:

agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = 'ruby mechanize'
page = agent.submit(google_form)
pp page

然而,当我循环遍历csv文件时,它会返回一个错误(在这个例子中,第一个csv条目将是' ruby​​ mechanize':

#i have already imported the csv list, now it is looping through the array "raw_list"

raw_list.each do |list|
abc = list.scan(/\[([^\)]+)\]/).last.first

# i tested a "puts abc" which returned "ruby mechanize", so I don't understand why the rest of this doesn't work


agent = Mechanize.new
page = agent.get('http://google.com/')
google_form = page.form('f')
google_form.q = abc

#even though abc = "ruby mechanize", an error occurs. 


page = agent.submit(google_form)
pp page

似乎没有采用变量&#34; abc&#34; ,但如果您手动输入&#39; ruby​​ mechanize&#39; < / strong>即使两者都相同。

出现的错误是:

C:filename: in `block (2 levels) in <top (required)>': undefined method `text' for nil:NilClass (NoMethodError)
from C:/RailsInstaller/Ruby2.0.0/lib/ruby/gems/2.0.0/gems/mechanize-2.7.3/lib/mechanize.rb:442:in `get'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:23:in `block in <top (required)>'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `each'
from C:/Users/victor/RubymineProjects/untitled/scraper.rb:19:in `<top (required)>'
from -e:1:in `load'
from -e:1:in `<main>'

任何帮助都将不胜感激。

1 个答案:

答案 0 :(得分:0)

您的错误告诉您代码中第19行的某些内容导致了机械化中第442行的问题。

我在IRB中尝试了你的样本,似乎工作正常:

2.2.2 :001 > require 'mechanize'
 => true 
2.2.2 :002 > agent = Mechanize.new
 => #<Mechanize:...
2.2.2 :003 > page = agent.get('http://google.com/')
 => #<Mechanize::Page
  ...
2.2.2 :004 > google_form = page.form('f')
 => #<Mechanize::Form
 ...
2.2.2 :005 > google_form.q
 => "" 
2.2.2 :006 > abc = "ruby mechanize"
 => "ruby mechanize" 
2.2.2 :007 > google_form.q = abc
 => "ruby mechanize" 
2.2.2 :008 > page = agent.submit(google_form)
 => #<Mechanize::Page
 ...

如果没有找到任何内容,扫描将返回nil,因此您的错误发生在此处:

abc = list.scan(/\[([^\)]+)\]/).last.first

http://ruby-doc.org/stdlib-2.2.0/libdoc/strscan/rdoc/StringScanner.html

您可以将其替换为:

abc = list.scan(/\[([^\)]+)\]/).join

你总是得到一个字符串,虽然它可能只是“”。

http://ruby-doc.org/core-2.2.0/Array.html#method-i-join