我试图使用多线程处理大量数据,但它根本没有提高性能。我试图探讨这个问题,事实证明每个线程在没有线程的情况下运行任务的时间大致相同。
以下是我在irb
上运行的简化版本。
arr = (0...1000000000).to_a and 1 # just to prevent irb from printing arr
>> 1
lambda { from = Time.now; arr.each{}; Time.now - from }.call
>> 58.062952
arr.each_slice((arr.size / 8.0).round).to_a.map{|arrr| Thread.new{ lambda { from = Time.now; arrr.each{}; Time.now - from }.call }}.map(&:join).map(&:value)
>> [56.541044, 46.74521, 47.887555, 49.059258, 55.008338, 55.687892, 55.997382, 55.404157]
正如您所看到的,每个线程在没有线程的情况下与任务完全相似。
请注意,这些时间是针对每个线程内的任务本身进行测量的,因此它与通信开销无关。
我测试了其他大小的数组,结果相似 - 没有线程的任务没有显着差异。
我很困惑。我很高兴知道为什么会这样。
答案 0 :(得分:0)
您可以使用parallel gem
加快示例速度gem install parallel
来安装gem
运行irb
:
irb(main):006:0> require 'parallel'
=> true
irb(main):001:0> arr = (0...100000000).to_a and 1
=> 1
irb(main):002:0> lambda { from = Time.now; arr.each{}; Time.now - from }.call
=> 7.424087
irb(main):009:0> arr.each_slice((arr.size / 8.0).round).to_a.map{|arrr| Thread.new{ lambda { from = Time.now; arrr.each{}; Time.now - from }.call }}.map(&:join).map(&:value)
=> [4.425176, 3.593438, 4.000537, 4.039098, 3.817529, 3.743535, 3.683716, 3.6868]
irb(main):008:0> Parallel.map(arr.each_slice((arr.size / 8.0).round).to_a, in_threads: 4) { |arrr| from = Time.now; arrr.each{}; Time.now - from }
=> [2.364945, 2.365872, 2.358072, 2.469698, 2.396479, 2.283937, 2.154116, 1.910335]
irb(main):008:0> Parallel.map(arr.each_slice((arr.size / 8.0).round).to_a, in_processes: 4) { |arrr| from = Time.now; arrr.each{}; Time.now - from }
=> [1.45764, 1.478868, 1.475091, 1.470171, 1.707919, 1.728389, 1.735843, 1.72616]