我正在寻找在Ruby中完成以下结构/逻辑问题的最佳方法:
网站需要完全抓取,收集每个网页的标题。
可是:
以下(简化)示例当然是完全愚蠢的:
url = some_root_url
@title_collection = Array.new
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |url|
go_to_page(url)
@title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
[...]
end
end
end
end
end
那么你将如何以“干”的方式灵活有效地实现这一目标呢?
非常感谢!
汤姆
答案 0 :(得分:2)
递归是你的朋友:
def walk_tree(url)
go_to_page(url)
title_collection << find_all_titles_on_page
urls = find_all_urls_on_page
urls.each do |child_url|
title_collection << walk_tree(child_url)
end
title_collection
end