我有一个ruby on rails应用程序试图访问Yahoo Sports上的各种链接,有时当它试图访问某个页面时,它会在下面给出我的错误。错误是一致的,并且它失败的任何链接,它总是失败。有时他们不工作,有时候他们不工作。你会发现页面确实存在且加载正常,所以我不确定它为什么会给我一个错误。有没有人以前经历过这种行为,如果有的话,你对如何让这个行为有任何建议吗?
404 => http://sports.yahoo.com/mlb/players/9893/的Net :: HTTPNotFound - 未处理的回复
@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'
#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'
result = @client.get(url)
答案 0 :(得分:1)
我无法通过机械化来解决这个问题,但我能够从HTTParty获取URL。如果您从机械化故障中解救并通过查找重定向URI重试,则应设置:
require 'mechanize'
require 'httparty'
@client = Mechanize.new()
url = 'http://sports.yahoo.com/mlb/players/9893'
begin
result = @client.get(url)
rescue Mechanize::ResponseCodeError => e
redirect_url = HTTParty.get(url).request.last_uri.to_s
result = @client.get(redirect_url)
end
答案 1 :(得分:0)
您需要处理重定向。 Mechanize为that- follow_meta_refresh提供了一种方法。尝试将其添加到您的代码中。例如:
require 'mechanize'
@client = Mechanize.new()
@client.request_headers = { "Accept-Encoding" => "" }
@client.ignore_bad_chunking = true
@client.follow_meta_refresh = true
#works
#url = 'http://sports.yahoo.com/mlb/players/7307'
#doesn't work
url = 'http://sports.yahoo.com/mlb/players/9893'
result = @client.get(url)
pp result
底部的pp将以漂亮的格式打印出页面以进一步抓取。它看起来像我机器上的正确内容。