For循环的Python多处理

时间:2015-12-25 14:49:50

标签: python parallel-processing multiprocessing

首先,我知道python上有很多关于多处理的线程,但这些线程似乎都无法解决我的问题。

这是我的问题: 我想实现随机森林算法,一个天真的方法就是这样:

def random_tree(Data):
    tree = calculation(Data)
    forest.append(tree)

forest = list()
for i in range(300):
    random_tree(Data)

forest有300"树"里面是我的最终结果。在这种情况下,如何将此代码转换为多处理版本?

更新: 我只是用一个非常简化的脚本尝试了Mukund M K的方法:

from multiprocessing import Pool

def f(x):
    return 2*x

data = np.array([1,2,5])

pool = Pool(processes=4)
forest = pool.map(f, (data for i in range(4))) 
# I use range() instead of xrange() because I am using Python 3.4

现在......脚本就像永远一样运行.....我打开一个python shell并逐行输入脚本,这就是我得到的消息:

> Process SpawnPoolWorker-1:  
> Process SpawnPoolWorker-2:  
> Traceback (most recent call last):  
> Process SpawnPoolWorker-3:  
> Traceback (most recent call last):  
> Process SpawnPoolWorker-4:  
> Traceback (most recent call last):  
> Traceback (most recent call last):  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
    self.run()  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
    self.run()  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
    self.run()  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap
    self.run()  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)  
> File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run
    self._target(*self._args, **self._kwargs)  
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()  
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()  
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()  
> File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker
    task = get()  
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
    return ForkingPickler.loads(res)  
> File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
    return ForkingPickler.loads(res)  
> AttributeError: Can't get attribute 'f' on   
> AttributeError: Can't get attribute 'f' on 
  File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
    return ForkingPickler.loads(res)  
> AttributeError: Can't get attribute 'f' on 
  File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get
    return ForkingPickler.loads(res)  
> AttributeError: Can't get attribute 'f' on 

更新:我根据其他一些示例代码编辑了我的示例代码:

from multiprocessing import Pool
import numpy as np

def f(x):
    return 2*x

if __name__ == '__main__':
    data = np.array([1,2,3])
    with Pool(5) as p:
        result = p.map(f, (data for i in range(300)))

它现在有效。我现在需要做的是现在用更复杂的算法填写它。
我脑海中的另一个问题是:为什么这个代码可以工作,而以前的版本不能?

2 个答案:

答案 0 :(得分:0)

包裹处理可能会对您有所帮助。请查看here

答案 1 :(得分:0)

你可以通过这种方式进行多处理:

Elements inputs = doc.select("input[type=text], input[type=password], input[type=email]");