感谢您的宝贵时间。 OOP和Ruby有点新功能,并且从几个不同的堆栈溢出答案中综合了解决方案之后,我就回过头来。
我的目标是编写一个使用Nokogiri库解析URL CSV的脚本。尝试使用failing to use open-uri和open-uri-redirections插件进行重定向之后,我选择了Net :: HTTP,这使我感动……直到我遇到专门具有302重定向的URL。
这是我用来使用URL的方法:
require 'Nokogiri'
require 'Net/http'
require 'csv'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
#puts "The value of uri_str is: #{ uri_str}"
#puts "The value of URI.parse(uri_str) is #{ url }"
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
# puts "THE URL IS #{url.scheme + ":" + url.host + url.path}" # just a reporter so I can see if it's mangled
response = Net::HTTP.start(url.host, url.port, :use_ssl => url.scheme == 'https') { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
#puts "Problem clause!"
response.error!
end
end
进一步在脚本中,我使用带有URL csv文件名的ARGV,执行CSV.read,将URL编码为字符串,然后使用Nokogiri :: HTML.parse将其全部转换为某种东西,我可以使用xpath选择器检查并写入输出CSV。
做工精美...只要我收到200回应,不幸的是,并非每个网站。当我碰到302时,我得到了:
C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1570:in `addr_port': undefined method `+' for nil:NilClass (NoMethodError)
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1503:in `begin_transport'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1442:in `transport_request'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:1416:in `request'
from httpcsv.rb:14:in `block in fetch'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:877:in `start'
from C:/Ruby24-x64/lib/ruby/2.4.0/Net/http.rb:608:in `start'
from httpcsv.rb:14:in `fetch'
from httpcsv.rb:17:in `fetch'
from httpcsv.rb:42:in `block in <main>'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from C:/Ruby24-x64/lib/ruby/2.4.0/csv.rb:866:in `each'
from httpcsv.rb:38:in `<main>'
我知道我正想念一些东西,但是我无法告诉我应该puts
来看看它是否为零。感谢您的任何帮助。