我想知道是否可以使用数组进行多个机械化查询?以下代码减去数组可以使用单个但不是多个查询。
require 'nokogiri'
require 'mechanize'
agent = Mechanize.new
#User Agent masking
agent.user_agent_alias = 'Windows Mozilla'
#Array of keywords to search
search = Array.new
search << 'TAICHI 21'
search << 'S56CA'
#Take our search array, insert it into a query
agent.get("http://www.asus.com/Search/?SearchKey=#{search}")
#This handles the url
File.open("results.txt","w") do |f|
PP.pp(page.links.find_all{|l| l.text =~ /#{search}/i},f)
end
第二次尝试
require 'nokogiri'
require 'mechanize'
agent = Mechanize.new
#User Agent masking
agent.user_agent_alias = 'Windows Mozilla'
#Array of keywords to search
search = [ 'S56CA', 'TAICHI 21' ]
#Take our search array, insert it into a query
agent.get("http://www.asus.com/Search/?SearchKey=#{search}")
File.open("results.txt","w")
#This handles the url
search.each do |f|
results.txt << PP.pp(page.links.find_all{|l| l.text =~ /#{search}/i},f)
end
答案 0 :(得分:0)
当您使用http://www.asus.com/Search/?SearchKey=S56CA搜索S56CA
时,该网站会重定向到http://www.asus.com/Notebooks_Ultrabooks/S56CA/页面。当搜索保留在结果页面上的TAICHI 21
时,这与此不同。
每个都是不同的行为,因此代码应该以不同的方式处理每个案例。
我修改了原始代码以包含机械化日志记录,如下所示:
require 'nokogiri'
require 'mechanize'
# Logging
require 'logger'
mechanize_logger = Logger.new('mechanize.log')
mechanize_logger.level = Logger::INFO
agent = Mechanize.new
agent.log = mechanize_logger
#User Agent masking
agent.user_agent_alias = 'Windows Mozilla'
search_terms = [ 'S56CA', 'TAICHI 21' ]
results_file = File.open( 'results.txt', 'w' )
#Take our search array, insert it into a query
search_terms.each do |search|
page = agent.get("http://www.asus.com/Search/?SearchKey=#{search}")
links = page.links.find_all{ |l| l.text =~ /#{search}/i}
links.each { |links_text| results_file.write( "#{links_text}\n" ) }
end
results_file.close
运行此代码时,程序的输出只是ASUS TAICHI 21
,写入results.txt
。
mechanize.log
文件包含以下内容,显示两次搜索之间的差异:
$ more mechanize.log
I, [2013-01-30T17:49:49.036790 #2142] INFO -- : Net::HTTP::Get: /Search/?SearchKey=S56CA
I, [2013-01-30T17:49:53.528281 #2142] INFO -- : status: Net::HTTPFound 1.1 302 Moved Temporarily
I, [2013-01-30T17:49:53.529388 #2142] INFO -- : follow redirect to: /Notebooks_Ultrabooks/S56CA/
I, [2013-01-30T17:49:53.530106 #2142] INFO -- : Net::HTTP::Get: /Notebooks_Ultrabooks/S56CA/
I, [2013-01-30T17:49:53.939353 #2142] INFO -- : status: Net::HTTPOK 1.1 200 OK
I, [2013-01-30T17:49:54.800423 #2142] INFO -- : Net::HTTP::Get: /Search/?SearchKey=TAICHI%2021
I, [2013-01-30T17:49:55.269454 #2142] INFO -- : status: Net::HTTPOK 1.1 200 OK
由于搜索页面包含内容Did you mean to search for
,因此您可以使用它来区分代码中的行为。