我正在添加从需要使用HTTPS连接和身份验证的来源擦除XML页面的功能。我正在尝试使用Ryan Bates的Railscast#190解决方案,但我遇到了401身份验证错误。
这是我的测试Ruby脚本:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
url = "https://biblesearch.americanbible.org/passages.xml?q[]=john+3:1-5&version=KJV"
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password']))
puts doc.xpath("//text_preview")
以下是我运行脚本后控制台的输出:
/usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect': SSL_connect returned=1 errno=0 state=SSLv3 read server certificate B: certificate verify failed (OpenSSL::SSL::SSLError)
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `block in connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:54:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/timeout.rb:99:in `timeout'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:799:in `connect'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:755:in `do_start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/net/http.rb:744:in `start'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:306:in `open_http'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:775:in `buffer_open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:203:in `block in open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `catch'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:201:in `open_loop'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:146:in `open_uri'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:677:in `open'
from /usr/local/rvm/rubies/ruby-1.9.3-p194/lib/ruby/1.9.1/open-uri.rb:33:in `open'
from scrape.rb:6:in `<main>'
在我的研究中,我看到一篇文章,其中建议在1.9.3中可以使用以下选项:
doc = Nokogiri::XML(open(url, :http_basic_authentication => ['username' ,'password'], :ssl_verify_mode => OpenSSL::SSL::VERIFY_NONE))
然而,这也不起作用。我很感激能够解决这一挑战。
答案 0 :(得分:4)
将使用HTTP状态代码/v1/KJV/passages.xml?q[]=john+3%3A1-5
将给定的网址重定向到302 Found
。 OpenURI理解重定向,但出于安全原因自动删除认证头(可能)。 (*)
如果直接访问"http://biblesearch.americanbible.org/v1/KJV/passages.xml?q[]=john+3%3A1-5"
,您将获得预期的结果。 : - )
(*)您可以在open-uri.rb
中找到:
if redirect
### snip ###
if options.include? :http_basic_authentication
# send authentication only for the URI directly specified.
options = options.dup
options.delete :http_basic_authentication
end
答案 1 :(得分:2)
你可以这样做,它也应该有效:
open(url, :http_basic_authentication => [user, pass] )
doc = Nokogiri::HTML(open(url, :http_basic_authentication => [user, pass] ))
然后,您可以随意解析文档。 通过在第二个请求中再次在标头中传递http_basic_authentication,您将在第一个请求中弥补已删除的标头。 希望这对你有用。
答案 2 :(得分:1)
您说您需要使用HTTPS,但您正在使用HTTP协议:
url = "http://biblesearch...."
OpenURI了解HTTP和HTTPS。如果要使用HTTPS进行连接,请将URL中的协议更改为HTTPS
,然后建立连接:
url = "https://biblesearch...."