Question

我有以下内容：

require 'rubygems'
require 'anemone'
require 'nokogiri'
require 'open-uri'

Anemone.crawl("http://www.findbrowsenodes.com/", :delay => 3) do |anemone|
  anemone.on_pages_like(/http:\/\/www.findbrowsenodes.com\/us\/.+\/[\d]*/) do | page |

    doc = Nokogiri::HTML(open(page.url))

    id       = doc.at_css("#n_info #clipnode").text unless doc.at_css("#n_info #clipnode").nil?

    File.open("#{node_id}.html", "wb") do |f|
      f.write(open(page).read)
    end
  end
end

所以我试图将每个URL保存为html文件：

    File.open("#{id}.html", "wb") do |f|
      f.write(open(page).read)
    end

但是我收到了这个错误：

alex @ alex-K43U：〜/ rails / anemone $ ruby anemone.rb /home/alex/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/open-uri.rb:35:in open': can't convert Anemone::Page into String (TypeError) from /home/alex/.rvm/rubies/ruby-1.9.3-p0/lib/ruby/1.9.1/open-uri.rb:35:in打开'来自anemone.rb：27：在block (3 levels) in <main>' from anemone.rb:26:in打开'来自anemone.rb：26：在`block（2 level）in “

这样做的正确方法是什么？

Answer 1

有几个问题/困惑：

正如错误所述，open方法需要String（即网址），但您提供了Anemone::Page个对象。

此对象有一个url方法，您已在第9行使用该方法。
第9行：open(page.url)

您已经打开了该页面，因此您可以重复使用该页面。但是：
根据文档http://anemone.rubyforge.org/doc/classes/Anemone/Page.html Anemone::Page包含可能已包含内容的body方法（我只是猜测，没有使用或尝试过该库）。如果是这种情况，则无需使用open。

正如我所看到的，以下未经测试的代码可能更像您正在寻找的内容：

doc = Nokogiri::HTML(page.body)

# [snip]

File.open("#{node_id}.html", "wb") do |f|
  f.write(page.body)
end

正确的阅读页面并将其保存为html文件的方法？

1 个答案: