输入URI时出现错误:
/Users/wiggum/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/rfc3986_parser.rb:66:in `split': bad URI(is not URI?): http://www.treasuredata.com (URI::InvalidURIError)
from /Users/wiggum/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/rfc3986_parser.rb:72:in `parse'
from /Users/wiggum/.rvm/rubies/ruby-2.2.0/lib/ruby/2.2.0/uri/common.rb:226:in `parse'
from sitecrawl.rb:11:in `<main>'
这是我的代码在我的另一台计算机上正常运行。有什么建议吗?
require 'Spidr'
require 'csv'
require 'Nokogiri'
require 'open-uri'
puts "What is the website you are looking to crawl?"
site = gets
#make a filename
f2 = ".csv"
f1 = URI.parse(site).host
filename = "#{f1}#{f2}"
CSV.open(filename, "wb") do |csv|
csv <<["Url", "Title Tag", "H1 Tags", "Meta Desc"]
Spidr.site(site) do |spider|
spider.every_url do |url|
page = Nokogiri::HTML(open(url)) rescue nil
title = page.xpath('//title') rescue nil
desc = page.xpath("//meta[@name='description']/@content") rescue nil
h1 = page.xpath('//h1') rescue nil
puts "#{url} #{title}"
puts "#{h1} #{desc}"
csv <<["#{url}", "#{title}", "#{h1}", "#{desc}"]
end`enter code here`
end
end
答案 0 :(得分:1)
不知道为什么它适用于您的其他计算机,它不应该在任何地方工作。 gets
抓取您输入的整个字符串,包括尾随换行符,因此您尝试解析的字符串实际上是:http://www.treasuredata.com\n
,它不是有效的URI。
将您的gets
更改为gets.chomp