欢迎,我遇到了问题,宝石机械化无法连接到网站。 Gem已安装。 代码:
require 'mechanize'
agent = Mechanize.new
main_page = agent.get 'https://imbd.com'
main_page.link_with(text: "Top 250").click
rows = list_page.root.css(".lister-list tr")
puts rows.size
这是一个错误:
C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `initialize': A connection attempt failed because the connected party did not properly respond after a period of time, or established connection failed because connected host has failed to respond. - connect(2) for "imbd.com" port 80 (Errno::ETIMEDOUT)
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `open'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:879:in `block in connect'
from C:/Ruby/lib/ruby/2.2.0/timeout.rb:73:in `timeout'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:878:in `connect'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:863:in `do_start'
from C:/Ruby/lib/ruby/2.2.0/net/http.rb:858:in `start'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:700:in `start'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:631:in `connection_for'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/net-http-persistent-2.9.4/lib/net/http/persistent.rb:994:in `request'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize/http/agent.rb:267:in `fetch'
from C:/Ruby/lib/ruby/gems/2.2.0/gems/mechanize-2.7.4/lib/mechanize.rb:464:in `get'
from C:/Ruby/Workspace/imbd.rb:4:in `<main>'
任何人都知道出了什么问题?谢谢!
答案 0 :(得分:0)
在查看imdb之后,我发现他们正在运行大量的javascript,这会导致机械化,因为它无法解析j并了解传入的响应。如果您正在寻找内容或自动浏览,我建议使用Capybara而不是Mechanize。将Capybara与Poltergeist结合起来(你需要用这种方法安装phantom.js)将比Mechanize更好地工作,并且可以自动与加载大量js的页面进行交互。
我添加了一种可能为您解决错误的方法。如果这是有效的,因为Mechanize试图在js脚本完成之前获取页面,因此无法获得有效数据。
编辑:
agent = Mechanize.new
agent.read_timeout=3 #set the agent time out
begin
main_page = agent.get 'https://imbd.com'
main_page.link_with(text: "Top 250").click
rows = list_page.root.css(".lister-list tr")
rescue Timeout::Error
puts "Timeout!"
puts "read_timeout attribute is set to #{agent.read_timeout}s" if !agent.read_timeout.nil?
end
答案 1 :(得分:0)
虽然机械化不支持javascript,但问题在于您尝试访问的网站并不存在。您正尝试访问www.imbd.com
而不是www.imdb.com
。因此,错误消息是准确的。
FWIW,IMDB并不希望你刮掉他们的网站:
机器人和屏幕抓取:除非得到我们明确的书面同意,否则您不得在本网站上使用数据挖掘,机器人,屏幕抓取或类似的数据收集和提取工具。