我必须对文件的每一行执行一些处理,并且输入目录中有很多文件。我必须将处理每行(从多个输入文件)处理的响应转储到单个结果文件中。
我已决定此流程 - 将所有输入文件转储到队列中并分叉3-4个工作程序,其中每个工作程序处理一个唯一文件,读取其内容并在处理后将响应转储到编写器队列中。它们将是一个单独的进程,它将读取此队列并将结果写入输出文件。
我已经提出了这个代码 -
def write_to_csv(queue):
file_path = os.path.join(os.getcwd(), 'test_dir', "writer.csv")
ofile = open(file_path, "w")
job_writer = csv.writer(ofile, delimiter='\a')
while 1:
line = queue.get()
if line == 'kill':
print("Kill Signal received")
break
if line:job_writer.writerow([str(line).strip()])
ofile.close()
def worker_main(file_queue, writer_queue):
print os.getpid(),"working"
while not file_queue.empty():
file_name = file_queue.get(True)
# somewhere in process_file writer_queue.put(line_resp) is called
# for every line in file_name
process_file(file_name, writer_queue)
if __name__ == "__main__":
file_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
writer_pool = multiprocessing.Pool(1, write_to_csv, (output_queue,))
cwd = os.getcwd()
test_dir = 'test_dir'
file_list = os.listdir(os.path.join(cwd, test_dir))
for file_name in file_list:
file_queue.put(file_name)
reader_pool = multiprocessing.Pool(3, worker_main, (file_queue, output_queue))
reader_pool.close()
reader_pool.join()
output_queue.put("kill")
print("Finished execution")
代码运行正常。但我想知道是否可以通过单个多处理池执行相同的操作,而不是在上面的代码中使用reader_pool
和writer_pool
答案 0 :(得分:1)
您可以apply_async
执行此操作,在创建initializer
对象时也不要设置write_to_csv
(worker_main
或Pool
),或者它会默认运行任务。
file_queue = multiprocessing.Queue()
output_queue = multiprocessing.Queue()
cwd = os.getcwd()
test_dir = 'test_dir'
file_list = os.listdir(os.path.join(cwd, test_dir))
for file_name in file_list:
file_queue.put(file_name)
pool = Pool(4)
pool.apply_async(write_to_csv, (output_queue,))
[pool.apply_async(worker_main, (file_queue, output_queue, )) for i in range(3)]
pool.close()
pool.join()