我似乎无法绕过多处理。我试图做一些基本的操作,但多处理脚本似乎需要永远。
import multiprocessing, time, psycopg2
class Consumer(multiprocessing.Process):
def __init__(self, task_queue, result_queue):
multiprocessing.Process.__init__(self)
self.task_queue = task_queue
self.result_queue = result_queue
def run(self):
proc_name = self.name
while True:
next_task = self.task_queue.get()
if next_task is None:
print ('Tasks Complete')
self.task_queue.task_done()
break
answer = next_task()
self.task_queue.task_done()
self.result_queue.put(answer)
return
class Task(object):
def __init__(self, a):
self.a = a
def __call__(self):
#Some more work will go in here but for now just return the value
return self.a
def __str__(self):
return 'ARC'
def run(self):
print ('IN')
if __name__ == '__main__':
start_time = time.time()
numberList = []
for x in range(1000000):
numberList.append(x)
result = []
counter = 0
total = 0
for id in numberList:
total =+ id
counter += 1
print(counter)
print("Finished in Seconds: %s" %(time.time()-start_time))
###############################################################################################################################
#Mutliprocessing starts here....
###############################################################################################################################
start_time = time.time()
tasks = multiprocessing.JoinableQueue()
results = multiprocessing.Queue()
num_consumers = multiprocessing.cpu_count()
consumers = [Consumer(tasks, results) for i in range(num_consumers)]
for w in consumers:
w.start()
num_jobs = len(numberList)
for i in range(num_jobs):
tasks.put(Task(numberList[i]))
for i in range(num_consumers):
tasks.put(None)
print("So far: %s" %(time.time()-start_time))
result = []
while num_jobs:
result.append(results.get())
num_jobs -= 1
print (len(result))
print("Finished in Seconds: %s" %(time.time()-start_time))
原始脚本来自here
循环的第一个基本平均结束时间为0.4秒,多处理结束时间为56秒,而我预计它会反过来。?
是否有一些逻辑缺失或实际上更慢?另外,我如何构建它比循环标准更快?
答案 0 :(得分:5)
将每个对象从进程传递到队列,从而增加了开销。您现在已经测量了百万个对象的开销为56秒。传递更少,更大的对象可以减少开销,但不能消除它。为了从多处理中受益,每个任务执行的计算与需要传输的数据量相比应该相对较重。
答案 1 :(得分:5)
您的多处理代码实际上是过度设计的,并且实际上并没有完成它本应该做的工作。我把它重写为更简单,实际做了它应该做的事情,现在它比简单的循环更快:
import multiprocessing
import time
def add_list(l):
total = 0
counter = 0
for ent in l:
total += ent
counter += 1
return (total, counter)
def split_list(l, n):
# Split `l` into `n` equal lists.
# Borrowed from http://stackoverflow.com/a/2136090/2073595
return [l[i::n] for i in xrange(n)]
if __name__ == '__main__':
start_time = time.time()
numberList = range(1000000):
counter = 0
total = 0
for id in numberList:
total += id
counter += 1
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
start_time = time.time()
num_consumers = multiprocessing.cpu_count()
# Split the list up so that each consumer can add up a subsection of the list.
lists = split_list(numberList, num_consumers)
p = multiprocessing.Pool(num_consumers)
results = p.map(add_list, lists)
total = 0
counter = 0
# Combine the results each worker returned.
for t, c in results:
total += t
counter += c
print(counter)
print(total)
print("Finished in Seconds: %s" %(time.time()-start_time))
这是输出:
Standard:
1000000
499999500000
Finished in Seconds: 0.272150039673
Multiprocessing:
1000000
499999500000
Finished in Seconds: 0.238755941391
正如@aruisdante指出的那样,你的工作量很小,所以多处理的好处在这里并没有真正感受到。如果你进行较重的处理,你会发现更大的差异。