我正在尝试创建多个文件,这些文件将使用独立程序进行分析,作为用python编写的高吞吐量分析的一部分。
for foo in X:
write foo_file
os.system(run_program foo_file)
对于15,000个不同的单个文件,如果我可以在多个核心上运行它们,这将运行得更快,但我不想淹没我的服务器。如何设置多个线程以便在os中运行但是最多可以同时打开线程数?我不担心产卵过程的速度,因为运行时是由我的领域中的外部程序标准定义的。
我查看了线程和多处理的文档并且不堪重负。
答案 0 :(得分:4)
限制产生的进程总数的一种简单方法是使用multiprocessing pool。
演示多处理池的一个简单示例是:
<强> test.py 强>
from multiprocessing.pool import Pool
# @NOTE: The two imports below are for demo purposes and won't be necessary in
# your final program
import random
import time
def writeOut(index):
""" A function which prints a start message, delays for a random interval and then
prints a finish message
"""
delay = random.randint(1,5)
print("Starting process #{0}".format(index))
time.sleep(delay)
print("Finished process #{0} which delayed for {1}s.".format(index, delay))
# Create a process pool with a maximum of 10 worker processes
pool = Pool(processes=10)
# Map our function to a data set - number 1 through 20
pool.map(writeOut, range(20))
哪个应该给你类似的输出:
[mike@tester ~]$ python test.py
Starting process #0
Starting process #2
Starting process #3
Starting process #1
Starting process #4
Starting process #5
Starting process #6
Starting process #7
Starting process #8
Starting process #9
Finished process #2 which delayed for 1s.
Starting process #10
Finished process #7 which delayed for 1s.
Finished process #6 which delayed for 1s.
Starting process #11
Starting process #12
Finished process #9 which delayed for 2s.
Finished process #12 which delayed for 1s.
Starting process #13
Starting process #14
Finished process #1 which delayed for 3s.
Finished process #5 which delayed for 3s.
Starting process #15
Starting process #16
Finished process #8 which delayed for 3s.
Starting process #17
Finished process #4 which delayed for 4s.
Starting process #18
Finished process #10 which delayed for 3s.
Finished process #13 which delayed for 2s.
Starting process #19
Finished process #0 which delayed for 5s.
Finished process #3 which delayed for 5s.
Finished process #11 which delayed for 4s.
Finished process #15 which delayed for 2s.
Finished process #16 which delayed for 2s.
Finished process #18 which delayed for 2s.
Finished process #14 which delayed for 4s.
Finished process #17 which delayed for 5s.
Finished process #19 which delayed for 5s.
正如您所看到的那样,前10个进程启动,然后每个后续进程仅在另一个进程池工作程序完成后才会启动(变为可用)。使用多个进程(而不是多个线程)绕过global interpreter lock (GIL)。
要使此示例代码与您的任务一起使用,您需要编写文件输出函数并将其传递给文件数据的可迭代文件以写入pool.map()
以代替writeOut
和{ {1}}。
答案 1 :(得分:1)
试试这个:
class ThreadWriteFile(threading.Thread):
def __init__(self, queue_to_write, queue_to_run):
threading.Thread.__init__(self)
self.queue_to_write = queue_to_write
self.queue_to_run = queue_to_run
def run(self):
while True:
foo_file = self.queue_to_write.get()
write foo_file
self.queue_to_run.put(foo_file)
self.queue_to_write.task_done()
class ThreadRunProgram(threading.Thread):
def __init__(self, queue_to_run):
threading.Thread.__init__(self)
self.queue_to_run = queue_to_run
def run(self):
while True:
foo_file = self.queue_to_run.get()
os.system(run_program foo_file)
self.queue_to_run.task_done()
queue_to_write = Queue.Queue()
queue_to_run = Queue.Queue()
for foo in X:
twf = ThreadWriteFile(queue_to_write, queue_to_run)
twf.daemon = True
twf.start()
queue_to_write.put(foo)
trf = ThreadRunProgram(queue_to_run)
trf.daemon = True
trf.start()
queue_to_write.join()
queue_to_run.join()