我有一个大文件(数百个megs),由文件名组成,每行一个。
我需要循环遍历文件名列表,并为每个文件名分叉一个进程。我一次最多需要8个分叉进程,我不想一次将整个文件名列表读入RAM。
我甚至不确定从哪里开始,任何人都可以帮助我吗?
答案 0 :(得分:6)
File.foreach("large_file").each_slice(8) do |eight_lines|
# eight_lines is an array containing 8 lines.
# at this point you can iterate over these filenames
# and spawn off your processes/threads
end
答案 1 :(得分:4)
听起来Process module对此任务非常有用。这是我作为起点快速拼凑的东西:
include Process
i = 0
for line in open('files.txt') do
i += 1
fork { `sleep #{rand} && echo "#{i} - #{line.chomp}" >> numbers.txt` }
if i >= 8
wait # join any single child process
i -= 1
end
end
waitall # join all remaining child processes
输出:
hello goodbye test1 test2 a b c d e f g $ ruby b.rb $ cat numbers.txt 1 - hello 3 - 2 - goodbye 5 - test2 6 - a 4 - test1 7 - b 8 - c 8 - d 8 - e 8 - f 8 - g
这种方式的工作原理是:
答案 2 :(得分:0)
这是Mark的解决方案包含在ProcessPool
课程中,可能有助于解决它(如果我犯了一些错误,请纠正我):
class ProcessPool
def initialize pool_size
@pool_size = pool_size
@free_slots = @pool_size
end
def fork &p
if @free_slots == 0
Process.wait
@free_slots += 1
end
@free_slots -= 1
puts "Free slots: #{@free_slots}"
Process.fork &p
end
def waitall
Process.waitall
end
end
pool = ProcessPool.new 8
for line in open('files.txt') do
pool.fork { Kernel.sleep rand(10); puts line.chomp }
end
pool.waitall
puts 'finished'
答案 3 :(得分:0)
Queue的标准库文档有
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join
我确实发现它有点冗长。
维基百科将此描述为thread pool pattern
答案 4 :(得分:0)
arr = IO.readlines(“filename”)