我试图使用多处理池来加速简单的Python程序。具体来说:imap_unordered函数。
在我的情况下,我正在搜索具有特定属性的特定对象,并且检查此属性需要很长时间,因此我想将负载分散到我的CPU核心上。
我创建了以下代码:
from multiprocessing import Pool as ThreadPool
pool = ThreadPool(4)
some_iterator = (create_item() for _ in range(100000))
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
if is_favourable(result):
break
不幸的是,在调用break之后,线程中仍然有很多活动(在我的计算机活动监视器中可以看到)。在找到有利的结果之前,我应该如何继续搜索结果,或者如何使用imap_unordered迭代器停止迭代所有项目?
答案 0 :(得分:1)
Pool.terminate()
将立即停止工作流程,而Pool.close()
将停止提交任务,并且一旦当前任务完成,流程将关闭。
Pool.terminate()
实例被垃圾收集,或者将其与Pool
一起使用,也会调用 with
,因此以下是一个解决方案:
import multiprocessing as mp
import time
def my_function(item):
print(mp.current_process().name,item)
time.sleep(2) # imitate a long process
return item * 2
def is_favourable(item):
return item == 20 # something to look for (result of item 10)
def find():
with mp.Pool() as pool:
some_iterator = range(100)
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
print(result)
if is_favourable(result):
return result # pool will be terminated exiting with.
if __name__ == '__main__':
start = time.time()
find()
print(time.time() - start)
单个线程会在22秒内找到第10项。在我的8核系统上,它在~4秒内找到它:
SpawnPoolWorker-2 0
SpawnPoolWorker-3 1
SpawnPoolWorker-1 2
SpawnPoolWorker-5 3
SpawnPoolWorker-4 4
SpawnPoolWorker-8 5
SpawnPoolWorker-7 6
SpawnPoolWorker-6 7
SpawnPoolWorker-1 8
SpawnPoolWorker-3 9
SpawnPoolWorker-2 10
4
2
0
8
SpawnPoolWorker-4 11
SpawnPoolWorker-8 12
10
SpawnPoolWorker-5 13
6
12
SpawnPoolWorker-7 14
SpawnPoolWorker-6 15
14
SpawnPoolWorker-3 16
18
SpawnPoolWorker-1 17
SpawnPoolWorker-2 18
16
20
4.203129768371582
答案 1 :(得分:1)
对于初学者,您的示例代码不使用multiprocessing
ThreadPool
因为您的import
语句错误(它只是有效地重命名常规{{1}那个类)。
无论如何,您可以使用Pool
/ Pool
作为自Python 3.3以来的上下文管理器并将循环放在其中。这将导致在退出上下文时自动调用其terminate()
方法(由于下面示例中的ThreadPool
语句)。
break
如果您使用的是旧版本的Python,则可以在from multiprocessing import current_process
from multiprocessing.pool import ThreadPool
from random import randint
import time
def create_item():
return randint(0, 20)
def is_favourable(value):
return value < 20
def my_function(value):
print(current_process().name, value)
time.sleep(2)
return value * 2
if __name__ == '__main__':
with ThreadPool(4) as pool: # Use as context manager (Python 3.3+)
some_iterator = (create_item() for _ in range(10000))
start = time.time()
results = pool.imap_unordered(my_function, some_iterator)
for result in results:
print('result:', result)
if is_favourable(result):
break # Stop loop and exit Pool context.
print('done')
print(time.time() - start)
语句之前立即显式调用pool.terminate()
(而不是使用break
语句。)