我的理解是concurrent.futures依赖于pickling参数来让它们在不同的进程(或线程)中运行。不应该腌制创建一个参数的副本?在Linux上它似乎没有这样做,即我必须明确传递副本。
我试图理解以下结果:
<0> rands before submission: [17, 72, 97, 8, 32, 15, 63, 97, 57, 60]
<1> rands before submission: [97, 15, 97, 32, 60, 17, 57, 72, 8, 63]
<2> rands before submission: [15, 57, 63, 17, 97, 97, 8, 32, 60, 72]
<3> rands before submission: [32, 97, 63, 72, 17, 57, 97, 8, 15, 60]
in function 0 [97, 15, 97, 32, 60, 17, 57, 72, 8, 63]
in function 1 [97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
in function 2 [97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
in function 3 [97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
以下是代码:
from __future__ import print_function
import time
import random
try:
from concurrent import futures
except ImportError:
import futures
def work_with_rands(i, rands):
print('in function', i, rands)
def main():
random.seed(1)
rands = [random.randrange(100) for _ in range(10)]
# sequence 1 and sequence 2 should give the same results but they don't
# only difference is that one uses a copy of rands (i.e., rands.copy())
# sequence 1
with futures.ProcessPoolExecutor() as ex:
for i in range(4):
print("<{}> rands before submission: {}".format(i, rands))
ex.submit(work_with_rands, i, rands)
random.shuffle(rands)
print('-' * 30)
random.seed(1)
rands = [random.randrange(100) for _ in range(10)]
# sequence 2
print("initial sequence: ", rands)
with futures.ProcessPoolExecutor() as ex:
for i in range(4):
print("<{}> rands before submission: {}".format(i, rands))
ex.submit(work_with_rands, i, rands[:])
random.shuffle(rands)
if __name__ == "__main__":
main()
[97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
来自哪里?这甚至不是传递给submit
的一个序列。
在Python 2下,结果略有不同。
答案 0 :(得分:2)
你在所有线程上共享相同的列表并且它已经变异了。它很难调试,因为当你添加一个打印时,它会表现不同。但此[97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
必须是shuffle
内的状态。 shuffle保存列表(所有线程中存在的相同列表)并多次更改它。在调用线程时,状态为[97, 32, 17, 15, 57, 97, 63, 72, 60, 8]
。这些值不会被内置复制,它们会被复制到另一个帖子中,因此您无法保证何时复制它们。
在shuffle完成之前shuffle产生的一个例子:
[31, 64, 88, 7, 68, 85, 69, 3, 15, 47] # initial value (rands)
# ex.submit() is called here
# shuffle() is called here
# shuffle starts changing rand to:
[31, 64, 88, 47, 68, 85, 69, 3, 15, 7]
[31, 64, 15, 47, 68, 85, 69, 3, 88, 7]
[31, 64, 15, 47, 68, 85, 69, 3, 88, 7]
[31, 64, 69, 47, 68, 85, 15, 3, 88, 7]
[31, 64, 85, 47, 68, 69, 15, 3, 88, 7] # threads may be called here
[31, 64, 85, 47, 68, 69, 15, 3, 88, 7] # or here
[31, 64, 85, 47, 68, 69, 15, 3, 88, 7] # or here
[31, 85, 64, 47, 68, 69, 15, 3, 88, 7]
[85, 31, 64, 47, 68, 69, 15, 3, 88, 7] # value when the shuffle has finished
随机播放源代码:
def shuffle(self, x, random=None):
if random is None:
randbelow = self._randbelow
for i in reversed(range(1, len(x))):
# pick an element in x[:i+1] with which to exchange x[i]
j = randbelow(i+1)
x[i], x[j] = x[j], x[i]
# added this print here. that's what prints the output above
# your threads are probably being called when this is still pending
print(x)
... other staff here
因此,如果您的输入为[17, 72, 97, 8, 32, 15, 63, 97, 57, 60]
,而您的输出为[97, 15, 97, 32, 60, 17, 57, 72, 8, 63]
,那么随机播放的步骤将介于&#34;之间。你的线程在中间的&#34;步骤中被调用
一个没有变异的例子,一般都试图避免在线程之间共享数据,因为它很难做到正确:
def work_with_rands(i, rands):
print('in function', i, rands)
def foo(a):
random.seed(random.randrange(999912)/9)
x = [None]*len(a)
for i in a:
_rand = random.randrange(len(a))
while x[_rand] is not None:
_rand = random.randrange(len(a))
x[_rand] = i
return x
def main():
rands = [random.randrange(100) for _ in range(10)]
with futures.ProcessPoolExecutor() as ex:
for i in range(4):
new_rands = foo(rands)
print("<{}> rands before submission: {}".format(i, new_rands ))
ex.submit(work_with_rands, i, new_rands )
<0> rands before submission: [84, 12, 93, 47, 40, 53, 74, 38, 52, 62]
<1> rands before submission: [74, 53, 93, 12, 38, 47, 52, 40, 84, 62]
<2> rands before submission: [84, 12, 93, 38, 62, 52, 53, 74, 47, 40]
<3> rands before submission: [53, 62, 52, 12, 84, 47, 93, 40, 74, 38]
in function 0 [84, 12, 93, 47, 40, 53, 74, 38, 52, 62]
in function 1 [74, 53, 93, 12, 38, 47, 52, 40, 84, 62]
in function 2 [84, 12, 93, 38, 62, 52, 53, 74, 47, 40]
in function 3 [53, 62, 52, 12, 84, 47, 93, 40, 74, 38]
答案 1 :(得分:1)
基本上,ProcessPoolExecutor.submit()方法将函数及其参数放到某些&#34;工作项&#34; dict(没有任何酸洗), 与另一个线程( _queue_management_worker )共享,该线程将WorkItems从该dict传递到实际工作进程读取的队列。
源代码中有一条注释,描述了并发模块架构: http://hg.python.org/cpython/file/16207b8495bf/Lib/concurrent/futures/process.py#l6
事实证明,没有足够的时间让 _queue_management_worker 收到有关提交电话之间新项目的通知。
所以,那个帖子一直在这里等待:(http://hg.python.org/cpython/file/16207b8495bf/Lib/concurrent/futures/process.py#l226) 并且仅在ProcessPoolExecutor.shutdown上唤醒(从ProcessPoolExecutor上下文退出时)。
如果你在第一个序列中加入了一些延迟,那就是:
with futures.ProcessPoolExecutor() as ex:
for i in range(4):
print("<{}> rands before submission: {}".format(i, rands))
ex.submit(work_with_rands, i, rands)
random.shuffle(rands)
time.sleep(0.01)
您将看到, _queue_management_worker 将被唤醒并将调用传递给工作进程,而work_with_rands将打印不同的值。