我想使用python并行化任务,所以我读到了pool.map,其中数据被分成多个块并由每个进程(线程)处理。 我有一个庞大的字典(200万字)和一个句子的文本文件,其思路是将句子分成单词并将每个单词与现有字典相匹配,并根据返回结果进行进一步处理。在这之前,我编写了一个虚拟程序来检查pool.map的功能,但它没有按预期工作(即单个进程比多个进程花费的时间更少)(我可以互换使用进程和线程,因为我认为每个线程都没有但这是一个过程)
def add_1(x):
return (x*x+x)
def main():
iter = 10000000
num = [i for i in xrange(iter)]
threads = 4
pool = ThreadPool(threads)
start = time.time()
results = pool.map(add_1,num,iter/threads)
pool.close()
pool.join()
end = time.time()
print('Total Time Taken = %f')% (end-start)
总时间:
Thread 1 Total Time Taken = 2.252361
Thread 2 Total Time Taken = 2.560798
Thread 3 Total Time Taken = 2.938640
Thread 4 Total Time Taken = 3.048179
Just using pool = ThreadPool()
def main:
num = [i for i in xrange(iter)]
#pool = ThreadPool(threads)
pool = ThreadPool()
start = time.time()
#results = pool.map(add_1,num,iter/threads)
results = pool.map(add_1,num)
pool.close()
pool.join()
end = time.time()
print('Total Time Taken = %f')% (end-start)
总时间= 3.031125
循环执行正常:
def main():
iter = 10000000
start = time.time()
for k in xrange(iter):
add_1(k)
end = time.time()
print ('Total Time normally = %f') % (end-start)
总时间通常= 1.087591
配置: 我使用的是python 2.7.6