使用Mechanize重定向到错误的URI会引发无效的URI错误

时间:2011-12-07 19:56:14

标签: ruby uri mechanize

我正在尝试与设计糟糕的Web服务器进行通信,但我还是想处理它。问题是,当我提交登录表单时,它会尝试在URI中嵌入消息,这会使URI库停止。

服务器将我重定向到

/path/ConvolutedNameForMenuPage.menu?name=bmenu.P_MainMnu&msg=WELCOME+<b>Welcome,+Jonathan+Allard,+to+our+poorly+designed+Administrative+Systems!<%2Fb>Dec+07,+201102%3A27+PM

这是正确的,它试图在重定向URI中传递未解析的HTML代码,我应该请求它以便将其取回。 Sheesh,标准!

现在,URI库,显然是被这种糟糕的做法激动不已,感叹道

URI::InvalidURIError: bad URI(is not URI?): /path/ConvolutedNameForMenuPage.menu?name=bmenu.P_MainMnu&msg=WELCOME+<b>Welcome,+Jonathan+Allard,+to+our+poorly+designed+Administrative+Systems!<%2Fb>Dec+07,+201102%3A27+PM   from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/1.9.1/uri/generic.rb:1202:in `rescue in merge'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/1.9.1/uri/generic.rb:1199:in `merge'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page/meta_refresh.rb:32:in `parse'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page/meta_refresh.rb:41:in `from_node'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/page.rb:282:in `block in meta_refresh'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:239:in `block in each'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:238:in `upto'
from /home/jon/.rbenv/versions/1.9.3-p0/lib/ruby/gems/1.9.1/gems/nokogiri-1.5.0/lib/nokogiri/xml/node_set.rb:238:in `each'

我感到痛苦,URI lib。

现在,如何捕获它,正确解析URI(或者只是完全删除它)并提交回来,好像什么也没发生?或者这是URI和Mechanize之间的错误吗?

1 个答案:

答案 0 :(得分:0)

在对代码进行一些挖掘后,我发现了问题的来源。

正如我在#177中解释的那样:

  

在   /lib/mechanize/page/meta_refresh.rb:40

class Mechanize::Page::MetaRefresh

def self.parse content, base_uri
  return unless content =~ CONTENT_REGEXP

  delay, refresh_uri = $1, $3

  dest = base_uri
  dest += refresh_uri if refresh_uri     # Oops!

  return delay, dest
end
     

如果URI::InvalidURIError,引用的行会引发refresh_uri   包含非法符号(例如<)。我不太清楚在哪里   但应该完成消毒。

如果您想知道,我的错误日志的URI#merge隐藏在 oops 行的+=运算符中。