我试图从Bastards的Ruby书中修改这段代码:
require "open-uri"
url = "http://www.nytimes.com"
pattern = "<img"
page = open(url).read
tags = page.scan(pattern)
puts "The site #{url} has #{tags.length} img tags"
我想修改它,以便程序请求URL,然后计算标记。我只编程了几天。这是我的代码。它可能包含多个错误:
require "open-uri"
puts "Enter URL"
urlnew = gets
urlnew = URI.encode(urlnew)
URI.parse(urlnew)
page = open(urlnew).read
pattern = "<img"
tags = page.scan(pattern)
puts "The site #{url} has #{tags.length} img tags"
当我运行它时,我收到此错误:
Enter URL
www.google.com
/usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `initialize': No such file or directory @ rb_sysopen - www.google.com%0A (Errno::ENOENT)
from /usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `open'
from /usr/local/rvm/rubies/ruby-2.2.1/lib/ruby/2.2.0/open-uri.rb:36:in `open'
from /home/ubuntu/workspace/ruby/hello.rb:6:in `<main>'
我尝试了各种方法来获取URL输入。
Open an IO stream from a local file or url
似乎无效。谢谢,如果你能提供帮助。
答案 0 :(得分:2)
使用chomp
删除用户输入结尾处的新行字符
urlnew = gets.chomp
另外,请务必在网址中输入http://
。或者您可以在代码中添加以下行
urlnew = "http://#{urlnew}" unless urlnew.start_with?("http://")
这是完整的工作计划:
require "open-uri"
puts "Enter URL"
urlnew = gets.chomp
urlnew = "http://#{urlnew}" unless urlnew.start_with?("http://")
urlnew = URI.encode(urlnew)
URI.parse(urlnew)
page = open(urlnew).read
pattern = "<img"
tags = page.scan(pattern)
puts "The site #{urlnew} has #{tags.length} img tags"
示例运行:
> ruby test.rb
Enter URL
stackoverflow.com
The site http://stackoverflow.com has 16 img tags
答案 1 :(得分:2)
为了解析HTML
回复正文,建议您使用nokogiri library
Nokogiri Ruby Library。
require 'nokogiri'
require "open-uri"
puts "Enter URL"
urlnew = URI.encode(gets.chop)
URI.parse(urlnew)
page = open(urlnew).read
html = Nokogiri::HTML.fragment(page)
result = html.css('img').count
puts "The site #{result} has #{result.length} img tags"