Ruby:同步fork池输出

时间:2014-03-27 20:49:37

标签: ruby fork synchronous pool

我正在尝试使用多个处理器创建迭代Enumerables的通用方法。我使用fork生成给定数量的worker,并将它们提供给处理重用闲置worker的进程。但是,我想同步输入和输出顺序。如果同时启动作业1和作业2并且在作业1之前完成作业2,则结果顺序不同步。我想以某种方式快速缓存输出以同步输出顺序,但我没有看到如何做到这一点?

#!/usr/bin/env ruby

require 'pp'

DEBUG = false
CPUS  = 2

module Enumerable
  # Fork each (feach) creates a fork pool with a specified number of processes
  # to iterate over the Enumerable object processing the specified  block.
  # Calling feach with :processes => 0 disables forking for debugging purposes.
  # It is possible to disable synchronized output with :synchronize => false
  # which will save some overhead.
  #
  # @example - process 10 elements using 4 processes:
  #
  # (0 ... 10).feach(:processes => 4) { |i| puts i; sleep 1 }
  def feach(options = {}, &block)
    $stderr.puts "Parent pid: #{Process.pid}" if DEBUG

    procs = options[:processes]   || 0
    sync  = options[:synchronize] || true

    if procs > 0
      workers = spawn_workers(procs, &block)
      threads = []

      self.each_with_index do |elem, index|
        $stderr.puts "elem: #{elem}    index: #{index}" if DEBUG

        threads << Thread.new do 
          worker = workers[index % procs]
          worker.process(elem)
        end

        if threads.size == procs
          threads.each { |thread| thread.join }
          threads = []
        end
      end

      threads.each { |thread| thread.join }
      workers.each { |worker| worker.terminate }
    else
      self.each do |elem|
        block.call(elem)
      end
    end
  end

  def spawn_workers(procs, &block)
    workers = []

    procs.times do 
      child_read, parent_write = IO.pipe
      parent_read, child_write = IO.pipe

      pid = Process.fork do
        begin
          parent_write.close
          parent_read.close
          call(child_read, child_write, &block)
        ensure
          child_read.close
          child_write.close
        end
      end

      child_read.close
      child_write.close

      $stderr.puts "Spawning worker with pid: #{pid}" if DEBUG

      workers << Worker.new(parent_read, parent_write, pid)
    end

    workers
  end

  def call(child_read, child_write, &block)
    while not child_read.eof?
      elem = Marshal.load(child_read)
      $stderr.puts "      call with Process.pid: #{Process.pid}" if DEBUG
      result = block.call(elem)
      Marshal.dump(result, child_write)
    end
  end

  class Worker
    attr_reader :parent_read, :parent_write, :pid

    def initialize(parent_read, parent_write, pid)
      @parent_read  = parent_read
      @parent_write = parent_write
      @pid          = pid
    end

    def process(elem)
      Marshal.dump(elem, @parent_write)
      $stderr.puts "   process with worker pid: #{@pid} and parent pid: #{Process.pid}" if DEBUG
      Marshal.load(@parent_read)
    end

    def terminate
      $stderr.puts "Terminating worker with pid: #{@pid}" if DEBUG
      Process.wait(@pid, Process::WNOHANG)
      @parent_read.close
      @parent_write.close
    end
  end
end

def fib(n) n < 2 ? n : fib(n-1)+fib(n-2); end # Lousy Fibonacci calculator <- heavy job

(0 ... 10).feach(processes: CPUS) { |i| puts "#{i}: #{fib(35)}" }

1 个答案:

答案 0 :(得分:1)

除非您强制所有子进程将其输出发送到父进程并让它对结果进行排序,或者在进程之间强制执行某种I / O锁定,否则无法同步输出。

在不知道您的长期目标是什么的情况下,很难提出解决方案。一般来说,你需要在每个进程中做很多工作才能使用fork获得任何显着的加速,并且没有一种简单的方法可以将结果返回到主程序。

Native Threads(Linux上的pthreads)可能更有意义来完成你想要做的事情,但并非所有版本的Ruby都支持该级别的线程。见:

Does ruby have real multithreading?