使用多处理池功能并行化for循环

时间:2018-06-05 15:15:33

标签: python for-loop parallel-processing kdtree

我试图效仿@这个位置:

[How to use threading in Python?

我有一个像这样的示例数据帧(df):

segment x_coord y_coord
a   1   1
a   2   4
a   1   7
b   2   3
b   4   3
b   8   3
c   4   4
c   2   5
c   7   8

使用for循环为循环中的每个段创建kd-tree,如下所示:

dist_name=df['segment'].unique()
for i in range(len(dist_name)):
    a=df[df['segment']==dist_name[i]]
    tree[i] = spatial.cKDTree(a[['x_coord','y_coord']])

如何使用链接中的示例并行化树的创建,如下所示:

results = [] 
for url in urls:
  result = urllib2.urlopen(url)
  results.append(result)

并行化为>>

pool = ThreadPool(4) 
results = pool.map(urllib2.urlopen, urls)

我的尝试

import pandas as pd
import time
from scipy import spatial
import random
from multiprocessing.dummy import Pool as ThreadPool 


dist_name=['a','b','c','d','e','f','g','h']

df=pd.DataFrame()

for i in range(len(dist_name)):
    if i==0:
       df['x_coord']=random.sample(range(1, 10000), 1000)
       df['y_coord']=random.sample(range(1, 10000), 1000)
       df['segment']=dist_name[i]
    else:
       tmp=pd.DataFrame()
       tmp['x_coord']=random.sample(range(1, 10000), 1000)
       tmp['y_coord']=random.sample(range(1, 10000), 1000)
       tmp['segment']=dist_name[i]
       df=df.append(tmp)



start_time = time.time()
for i in range(len(dist_name)):
    a=df[df['segment']==dist_name[i]]
    tree = spatial.cKDTree(a[['x_coord','y_coord']])

print("--- %s seconds ---" % (time.time() - start_time))

--- 0.0312347412109375秒---

def func(name):
    a = df[df['segment'] == name]
    return spatial.cKDTree(a[['x_coord','y_coord']])

pool = ThreadPool(4) 

start_time = time.time()
tree = pool.map(func, dist_name)
print("--- %s seconds ---" % (time.time() - start_time))

--- 0.031250953674316406秒---

1 个答案:

答案 0 :(得分:0)

您的代码:

dist_name=df['segment'].unique()
for i in range(len(dist_name)):
    a=df[df['segment']==dist_name[i]]
    tree[i] = spatial.cKDTree(a[['x_coord','y_coord']])

需要转变为:

dist_name=df['segment'].unique()

def func(name):
    a = df[df['segment'] == name]
    return spatial.cKDTree(a[['x_coord','y_coord']])

您致电pool.map

pool = ThreadPool(4) 
tree = pool.map(func, dist_name)