Question

所以这就是我所拥有的：

require 'rubygems'
require 'nokogiri'
require 'open-uri'

root_url = "http://boxerbiography.blogspot.com/2006/11/table-of-contents.html"
file_path = "boxer-noko.html"

site = Nokogiri::HTML(open(root_url))

titles = []
content = []

site.css(".entry a").each do |link|
    titles.push(link)

    content_url = link[:href]
    content_page = Nokogiri::HTML(open(content_url))

    content_page.css("#top p").each do |copy|
        content.push(copy)
    end

end

但是这样做是n ^ n循环。即如果主页面上有5个链接，则转到第一个链接，然后在content中为它分配所有5个链接的值（当前的链接位于顶部），然后返回并转到下一个并继续这样做。

因此，每条内容实际上都会返回每个链接的内容，如下所示：

Link 1

Copy associated with Link 1.
Copy associated with Link 2.
Copy associated with Link 3.
.
.
.

Link 2

Copy associated with Link 2.
Copy associated with Link 3.
Copy associated with Link 4.
Copy associated with Link 5.
Copy associated with Link 1.
.
.
.

etc.

我希望它做的是返回：

Link 1

Copy associated with Link 1.

Link 2

Copy associated with Link 2.

以尽可能高效的方式。

我该怎么做？

编辑1：我想一个简单的思考方法就是在每个数组中说titles，我想存储链接和与该链接相关的内容。但不太确定如何做到这一点，因为我必须打开两个URI连接来解析两个页面并继续回到根目录。

所以我想象它：

title[0] = :href => "http://somelink.com", :content => "Copy associated with some link".

但是不能完全实现它，所以我不得不使用两个对我来说不是最理想的数组。

解析页面上的所有链接，访问它们，提取正文副本然后继续有效地遍历

0 个答案: