Question

我需要刮掉（使用scrAPI）400多个网页ruby，我的实际代码是非常顺序的：

data = urls.map {|url| scraper.scrape url }

实际上代码有点不同（异常处理和东西）。

如何让它更快？如何并行化下载？

Answer 1

th = []
data = []
dlock = Mutex.new

urls.each do |url|
  th << Thread.new(url) do |url|
    d = scraper.scrape url
    dlock.synchronize { data << d }
  end
end

th.each { |t| t.join }

多田！（注意;从记忆中写出来，未经过测试，可能会吃掉你的小猫等）

编辑：我认为有人必须写一个广义版本，所以他们有：http://peach.rubyforge.org/ - 尽情享受！

Answer 2

这是解释线程解释中使用的一个例子：

http://www.rubycentral.com/pickaxe/tut_threads.html

您应该能够轻松地修改Pickaxe代码以使用刮刀。

如何在ruby中下载快速的网页？并行下载？

2 个答案: