Mechanize

时间:2015-07-14 13:48:05

标签: ruby-on-rails ruby mechanize

我有一个ruby on rails应用程序试图访问Yahoo Sports上的各种链接,有时当它试图访问某个页面时,它会在下面给出我的错误。错误是一致的,并且它失败的任何链接,它总是失败。有时他们不工作,有时候他们不工作。你会发现页面确实存在且加载正常,所以我不确定它为什么会给我一个错误。有没有人以前经历过这种行为,如果有的话,你对如何让这个行为有任何建议吗?

  

404 => http://sports.yahoo.com/mlb/players/9893/的Net :: HTTPNotFound    - 未处理的回复

@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true

#works
#url = 'http://sports.yahoo.com/mlb/players/7307'

#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'

result = @client.get(url)

2 个答案:

答案 0 :(得分:1)

我无法通过机械化来解决这个问题,但我能够从HTTParty获取URL。如果您从机械化故障中解救并通过查找重定向URI重试,则应设置:

require 'mechanize'
require 'httparty'

@client = Mechanize.new()

url = 'http://sports.yahoo.com/mlb/players/9893'

begin
  result = @client.get(url)
rescue Mechanize::ResponseCodeError => e
  redirect_url = HTTParty.get(url).request.last_uri.to_s
  result = @client.get(redirect_url)
end

答案 1 :(得分:0)

您需要处理重定向。 Mechanize为that- follow_meta_refresh提供了一种方法。尝试将其添加到您的代码中。例如:

require 'mechanize'

@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
@client.follow_meta_refresh = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'

#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'

result = @client.get(url)
pp result

底部的pp将以漂亮的格式打印出页面以进一步抓取。它看起来像我机器上的正确内容。