在多处理Python

时间:2015-06-02 16:12:55

标签: python multithreading python-2.7 multiprocessing

我正在尝试使用多处理代码python(2.7)编辑后编写某些文件。它的作用就像一个小号(<20)的魅力。但是当我尝试更多的文件(20+)时,它会变得狂暴。 我在CentOS 6.5上使用Python 2.7.5和4核处理器。

import sys, os
import multiprocessing

import glob
list_files = glob.glob("Protein/*.txt")

def Some_func(some_file):
    with open(some_file) as some:
        with open(file_output) as output:
            for lines in Some:
                #Do Something
                #edited_lines = func(lines)
                output.write(edited_lines)


pool = multiprocessing.Pool(10) # Desired number of threads = 10
pool.map(Some_func, list_files,)
pool.close()
pool.join()

最终的书面文件相互重叠。

File 1
Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1
Lines 4 .. File 1
Lines 5 .. File 1
Lines 6 .. File 1
Lines 7 .. File 1
Lines 8 .. File 1
Lines 9 .. File 1

File 2
Lines 1 .. File 2
Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 5 .. File 2
Lines 6 .. File 2
Lines 7 .. File 2
Lines 8 .. File 2
Lines 9 .. File 2



Output:

Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1 Lines 1 .. File 2
Lines 4 .. File 1
Lines 5 .. File 1Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 6 .. File 1

1 个答案:

答案 0 :(得分:1)

问题是您正在尝试并行地从许多进程写入文件,这是不同步的。这意味着不同的进程可能会同时尝试写入,从而导致您看到的奇怪现象。

您可以通过使用单个编写器进程,每个工作人员发送行以写入该单个进程,或者使用multiprocessing.Lock同步每个进程执行的写入来解决此问题。

使用单一作家:

import glob
import multiprocessing
from functools import partial
from threading import Thread

list_files = glob.glob("Protein/*.txt")

def Some_func(out_q, some_file):
    with open(some_file) as some:
        for lines in Some:
            #Do Something
            #edited_lines = func(lines)

            out_q.put(edited_lines)

def write_lines(q):
   with open(file_output) as output:
       for line in iter(q.get, None): # This will end when None is received
           output.write(line)

pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
q = m.Queue()
t = Thread(target=write_lines, args=(q,))
t.start()
pool.map(partial(Some_func, q), list_files)
pool.close()
pool.join()
q.put(None)  # Shut down the writer thread
t.join()

使用multiprocessing.Lock

import glob
import multiprocessing
from functools import partial

list_files = glob.glob("Protein/*.txt")

def Some_func(lock, some_file):
    with open(some_file) as some:
        with open(file_output) as output:
            for lines in Some:
                #Do Something
                #edited_lines = func(lines)
                with lock:
                    output.write(edited_lines)


pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
lock = m.Lock()
pool.map(partial(Some_func, lock), list_files)
pool.close()
pool.join()

我们需要使用Manager来创建共享对象,因为您将它们传递给Pool,这需要对它们进行腌制。普通multiprocessing.Lock / multiprocessing.Queue个对象只能传递给multiprocessing.Process构造函数,并且会在传递给Poolmap方法时导致异常。