我一直在使用一个脚本(上面)在一个带有16个处理器的Ubuntu服务器上并行运行一些任务,它实际上有效,但我有几个问题:
怎么可以改善它?
#!/usr/bin/env python
from multiprocessing import Process, Queue
from executable import run_model
from database import DB
import numpy as np
def worker(work_queue, db_conection):
try:
for phone in iter(work_queue.get, 'STOP'):
registers_per_number = retrieve_CDRs(phone, db_conection)
run_model(np.array(registers_per_number), db_conection)
#print("The phone %s was already run" % (phone))
except Exception:
pass
return True
def retrieve_CDRs(phone, db_conection):
return db_conection.retrieve_data_by_person(phone)
def main():
phone_numbers = np.genfromtxt("../listado.csv", dtype="int")[:2000]
workers = 16
work_queue = Queue()
processes = []
#print("Process started with %s" % (workers))
for phone in phone_numbers:
work_queue.put(phone)
#print("Phone %s put at the queue" % (phone))
#print("The queue %s" % (work_queue))
for w in xrange(workers):
#print("The worker %s" % (w))
# new conection to data base
db_conection = DB()
p = Process(target=worker, args=(work_queue, db_conection))
p.start()
#print("Process %s started" % (p))
processes.append(p)
work_queue.put('STOP')
for p in processes:
p.join()
if __name__ == '__main__':
main()
干杯!
答案 0 :(得分:0)
首先,从主要功能开始:
希望能帮助您理解代码。实际上,它是你正在尝试的多线程,它的行为就像并行处理一样。因此,您使用的数字越多,它就会变得更快。你应该可以使用2000处理器,因为我的常识说。在那之后它作为主从哲学没有意义。此外,并行处理建议您最小化空闲处理器/工作器的数量。如果你有超过2000名工人,那么你将有一些闲置工人会降低你的表现。最后,改进并行处理需要改进这种意识形态。
希望有所帮助。干杯!