我在Python中探索多任务,在阅读了这篇article后,我创建了一个比较多线程和多处理之间性能的例子:
dummy_data = ''.join(['0' for i in range(1048576)]) # around 1MB of data
def do_something(num):
l = []
for i in range(num):
l.append(dummy_data)
def test(use_thread):
if use_thread: title = 'Thread'
else: title = 'Process'
num = 1000
jobs = []
for i in range(4): # the test machine has 4 cores
if use_thread:
j = Thread(target=do_something, args=(num,))
else:
j = Process(target=do_something, args=(num,))
jobs.append(j)
start = time.time()
for j in jobs: j.start()
for j in jobs: j.join()
end = time.time()
print '{0}: {1}'.format(title, str(end - start))
结果是:
Process: 0.0416989326477
Thread: 0.149359941483
这意味着使用Process会获得更好的性能,因为它利用了可用的核心。
但是,如果我将函数do_something
的实现更改为:
def do_something_1(num):
l = ''.join([dummy_data for i in range(num)])
使用进程突然比线程更糟糕(由于MemoryError,我将num
值减少到1000):
Process: 14.6903309822
Thread: 4.30753493309
有人可以向我解释为什么使用do_something
的第二个实现会导致Process
与Thread
相比效果更差?