我正在尝试使用多处理代码python(2.7)编辑后编写某些文件。它的作用就像一个小号(<20)的魅力。但是当我尝试更多的文件(20+)时,它会变得狂暴。 我在CentOS 6.5上使用Python 2.7.5和4核处理器。
import sys, os
import multiprocessing
import glob
list_files = glob.glob("Protein/*.txt")
def Some_func(some_file):
with open(some_file) as some:
with open(file_output) as output:
for lines in Some:
#Do Something
#edited_lines = func(lines)
output.write(edited_lines)
pool = multiprocessing.Pool(10) # Desired number of threads = 10
pool.map(Some_func, list_files,)
pool.close()
pool.join()
最终的书面文件相互重叠。
File 1
Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1
Lines 4 .. File 1
Lines 5 .. File 1
Lines 6 .. File 1
Lines 7 .. File 1
Lines 8 .. File 1
Lines 9 .. File 1
File 2
Lines 1 .. File 2
Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 5 .. File 2
Lines 6 .. File 2
Lines 7 .. File 2
Lines 8 .. File 2
Lines 9 .. File 2
Output:
Lines 1 .. File 1
Lines 2 .. File 1
Lines 3 .. File 1 Lines 1 .. File 2
Lines 4 .. File 1
Lines 5 .. File 1Lines 2 .. File 2
Lines 3 .. File 2
Lines 4 .. File 2
Lines 6 .. File 1
答案 0 :(得分:1)
问题是您正在尝试并行地从许多进程写入文件,这是不同步的。这意味着不同的进程可能会同时尝试写入,从而导致您看到的奇怪现象。
您可以通过使用单个编写器进程,每个工作人员发送行以写入该单个进程,或者使用multiprocessing.Lock
同步每个进程执行的写入来解决此问题。
使用单一作家:
import glob
import multiprocessing
from functools import partial
from threading import Thread
list_files = glob.glob("Protein/*.txt")
def Some_func(out_q, some_file):
with open(some_file) as some:
for lines in Some:
#Do Something
#edited_lines = func(lines)
out_q.put(edited_lines)
def write_lines(q):
with open(file_output) as output:
for line in iter(q.get, None): # This will end when None is received
output.write(line)
pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
q = m.Queue()
t = Thread(target=write_lines, args=(q,))
t.start()
pool.map(partial(Some_func, q), list_files)
pool.close()
pool.join()
q.put(None) # Shut down the writer thread
t.join()
使用multiprocessing.Lock
:
import glob
import multiprocessing
from functools import partial
list_files = glob.glob("Protein/*.txt")
def Some_func(lock, some_file):
with open(some_file) as some:
with open(file_output) as output:
for lines in Some:
#Do Something
#edited_lines = func(lines)
with lock:
output.write(edited_lines)
pool = multiprocessing.Pool(10) # Desired number of threads = 10
m = multiprocessing.Manager()
lock = m.Lock()
pool.map(partial(Some_func, lock), list_files)
pool.close()
pool.join()
我们需要使用Manager
来创建共享对象,因为您将它们传递给Pool
,这需要对它们进行腌制。普通multiprocessing.Lock
/ multiprocessing.Queue
个对象只能传递给multiprocessing.Process
构造函数,并且会在传递给Pool
等map
方法时导致异常。