我有两个使用concurrent.futures.ProcessPoolExecutor
的多处理程序版本(Python 3.6,Linux),尽管看似很小的变化(一个比另一个慢大约3倍),但速度差异却令人惊讶。
每个子进程执行一个简单的函数,该函数从一个大的dict中读取(它不会改变它)并返回结果。
executor.submit()
中。 传入的变量:
#!/usr/bin/env python3
import concurrent.futures, pstats, sys, cProfile
BIG_DICT = {i: 2*i for i in range(10000)}
def foo(d):
return d[0]
with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
tasks = [executor.submit(foo, BIG_DICT) for _ in range(100000)]
for task in concurrent.futures.as_completed(tasks):
task.result()
全局变量读取自:
#!/usr/bin/env python3
import concurrent.futures, pstats, sys, cProfile
BIG_DICT = {i: 2*i for i in range(10000)}
def foo():
return BIG_DICT[0]
with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor:
tasks = [executor.submit(foo) for _ in range(100000)]
for task in concurrent.futures.as_completed(tasks):
task.result()
我已经使用cProfile剖析了程序的两个版本,似乎大部分执行时间都花在了等待锁上。全局版本仅等待约10秒,而传递版本等待约80秒!
据我了解,对一个进程进行分叉时,应复制其父级的内存。由于程序是多进程的,并且BIG_DICT
在创建后从未真正修改过,因此在提交每个进程之间不需要进行锁定以保持状态一致性。
两个版本中都需要将BIG_DICT
复制到每个子进程的内存空间中,为什么执行时间会有如此大的差异?
我有一些想法:
ProcessPoolExecutor
的实施细节传入的变量:
7672287 function calls in 92.434 seconds
Ordered by: internal time, cumulative time
List reduced from 247 to 12 due to restriction <0.05>
ncalls tottime percall cumtime percall filename:lineno(function)
460133 75.428 0.000 75.428 0.000 {method 'acquire' of '_thread.lock' objects}
100001 7.034 0.000 7.034 0.000 {built-in method posix.write}
100001 2.490 0.000 2.490 0.000 {method '__enter__' of '_multiprocessing.SemLock' objects}
100001 0.686 0.000 78.344 0.001 _base.py:196(as_completed)
90033 0.553 0.000 75.879 0.001 threading.py:263(wait)
100000 0.548 0.000 13.639 0.000 process.py:449(submit)
190033 0.366 0.000 0.713 0.000 _base.py:174(_yield_finished_futures)
100000 0.351 0.000 0.598 0.000 _base.py:312(__init__)
90033 0.327 0.000 76.335 0.001 threading.py:533(wait)
100001 0.261 0.000 7.617 0.000 connection.py:181(send_bytes)
480065 0.260 0.000 0.382 0.000 threading.py:239(__enter__)
100001 0.258 0.000 11.329 0.000 queues.py:339(put)
Ordered by: internal time, cumulative time
List reduced from 247 to 12 due to restriction <0.05>
Function was called by...
ncalls tottime cumtime
{method 'acquire' of '_thread.lock' objects} <- 90033 0.078 0.078 threading.py:251(_acquire_restore)
190033 0.391 0.391 threading.py:254(_is_owned)
180066 74.956 74.956 threading.py:263(wait)
1 0.003 0.003 threading.py:1062(_wait_for_tstate_lock)
{built-in method posix.write} <- 100001 7.034 7.034 connection.py:365(_send)
{method '__enter__' of '_multiprocessing.SemLock' objects} <- 100001 2.490 2.490 synchronize.py:95(__enter__)
_base.py:196(as_completed) <-
threading.py:263(wait) <- 90033 0.553 75.879 threading.py:533(wait)
process.py:449(submit) <- 100000 0.548 13.639 local.py:13(<listcomp>)
_base.py:174(_yield_finished_futures) <- 190033 0.366 0.713 _base.py:196(as_completed)
_base.py:312(__init__) <- 100000 0.351 0.598 process.py:449(submit)
threading.py:533(wait) <- 90032 0.327 76.334 _base.py:196(as_completed)
1 0.000 0.001 threading.py:828(start)
connection.py:181(send_bytes) <- 100001 0.261 7.617 queues.py:339(put)
threading.py:239(__enter__) <- 100000 0.070 0.116 _base.py:174(_yield_finished_futures)
100000 0.033 0.051 _base.py:405(result)
100000 0.083 0.108 queue.py:115(put)
90032 0.040 0.058 threading.py:523(clear)
90033 0.034 0.050 threading.py:533(wait)
queues.py:339(put) <- 100000 0.258 11.329 process.py:449(submit)
1 0.000 0.000 process.py:499(shutdown)
全局变量读取自:
5949819 function calls in 27.158 seconds
Ordered by: internal time, cumulative time
List reduced from 247 to 12 due to restriction <0.05>
ncalls tottime percall cumtime percall filename:lineno(function)
160569 10.072 0.000 10.072 0.000 {method 'acquire' of '_thread.lock' objects}
100001 5.453 0.000 5.453 0.000 {method '__enter__' of '_multiprocessing.SemLock' objects}
100001 5.338 0.000 5.338 0.000 {built-in method posix.write}
100000 0.883 0.000 1.163 0.000 _base.py:312(__init__)
100000 0.477 0.000 15.671 0.000 process.py:449(submit)
100001 0.438 0.000 6.133 0.000 connection.py:181(send_bytes)
100001 0.304 0.000 12.921 0.000 queues.py:339(put)
100000 0.304 0.000 0.304 0.000 process.py:116(__init__)
100001 0.277 0.000 0.432 0.000 reduction.py:38(__init__)
100000 0.267 0.000 0.333 0.000 threading.py:334(notify)
100000 0.240 0.000 0.747 0.000 queue.py:115(put)
100006 0.238 0.000 0.280 0.000 threading.py:215(__init__)
Ordered by: internal time, cumulative time
List reduced from 247 to 12 due to restriction <0.05>
Function was called by...
ncalls tottime cumtime
{method 'acquire' of '_thread.lock' objects} <- 15142 0.007 0.007 threading.py:251(_acquire_restore)
115142 0.038 0.038 threading.py:254(_is_owned)
30284 10.022 10.022 threading.py:263(wait)
1 0.004 0.004 threading.py:1062(_wait_for_tstate_lock)
{method '__enter__' of '_multiprocessing.SemLock' objects} <- 100001 5.453 5.453 synchronize.py:95(__enter__)
{built-in method posix.write} <- 100001 5.338 5.338 connection.py:365(_send)
_base.py:312(__init__) <- 100000 0.883 1.163 process.py:449(submit)
process.py:449(submit) <- 100000 0.477 15.671 global.py:13(<listcomp>)
connection.py:181(send_bytes) <- 100001 0.438 6.133 queues.py:339(put)
queues.py:339(put) <- 100000 0.304 12.921 process.py:449(submit)
1 0.000 0.000 process.py:499(shutdown)
process.py:116(__init__) <- 100000 0.304 0.304 process.py:449(submit)
reduction.py:38(__init__) <- 100001 0.277 0.432 reduction.py:48(dumps)
threading.py:334(notify) <- 100000 0.267 0.333 queue.py:115(put)
queue.py:115(put) <- 100000 0.240 0.747 process.py:449(submit)
threading.py:215(__init__) <- 100000 0.238 0.280 _base.py:312(__init__)
3 0.000 0.000 queue.py:27(__init__)
1 0.000 0.000 queues.py:67(_after_fork)
2 0.000 0.000 threading.py:498(__init__)