如何在Ruby中打开URL并计算图像标签?

时间:2015-12-15 07:11:06

标签: ruby-on-rails ruby get

我试图从Bastards的Ruby书中修改这段代码:

require "open-uri"
url = "http://www.nytimes.com"
pattern = "<img"   

page = open(url).read
tags = page.scan(pattern)
puts "The site #{url} has #{tags.length} img tags"

我想修改它,以便程序请求URL,然后计算标记。我只编程了几天。这是我的代码。它可能包含多个错误:

require "open-uri"
puts "Enter URL"
urlnew = gets
urlnew = URI.encode(urlnew)
URI.parse(urlnew)
page = open(urlnew).read
pattern = "<img"   
tags = page.scan(pattern)
puts "The site #{url} has #{tags.length} img tags"

当我运行它时,我收到此错误:

Enter URL                                                                                                                                                                                                                              
www.google.com                                                                                                                                                                                                                         
/usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `initialize': No such file or directory @ rb_sysopen - www.google.com%0A (Errno::ENOENT)                                                                             
        from /usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `open'                                                                                                                                                  
        from /usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `open'                                                                                                                                                  
        from /home/ubuntu/workspace/ruby/hello.rb:6:in `<main>'  

我尝试了各种方法来获取URL输入。

Open an IO stream from a local file or url

似乎无效。谢谢,如果你能提供帮助。

2 个答案:

答案 0 :(得分:2)

使用chomp删除用户输入结尾处的新行字符

urlnew = gets.chomp

另外,请务必在网址中输入http://。或者您可以在代码中添加以下行

urlnew = "http://#{urlnew}" unless urlnew.start_with?("http://")

这是完整的工作计划:

require "open-uri"
puts "Enter URL"
urlnew = gets.chomp
urlnew = "http://#{urlnew}" unless urlnew.start_with?("http://")
urlnew = URI.encode(urlnew)
URI.parse(urlnew)
page = open(urlnew).read
pattern = "<img"   
tags = page.scan(pattern)
puts "The site #{urlnew} has #{tags.length} img tags"

示例运行:

> ruby test.rb
Enter URL
stackoverflow.com
The site http://stackoverflow.com has 16 img tags

答案 1 :(得分:2)

为了解析HTML回复正文,建议您使用nokogiri library Nokogiri Ruby Library

require 'nokogiri'
require "open-uri"
puts "Enter URL"
urlnew = URI.encode(gets.chop)
URI.parse(urlnew)
page = open(urlnew).read
html = Nokogiri::HTML.fragment(page)
result = html.css('img').count
puts "The site #{result} has #{result.length} img tags"