首先,我知道python上有很多关于多处理的线程,但这些线程似乎都无法解决我的问题。
这是我的问题: 我想实现随机森林算法,一个天真的方法就是这样:
def random_tree(Data):
tree = calculation(Data)
forest.append(tree)
forest = list()
for i in range(300):
random_tree(Data)
forest
有300"树"里面是我的最终结果。在这种情况下,如何将此代码转换为多处理版本?
更新: 我只是用一个非常简化的脚本尝试了Mukund M K的方法:
from multiprocessing import Pool
def f(x):
return 2*x
data = np.array([1,2,5])
pool = Pool(processes=4)
forest = pool.map(f, (data for i in range(4)))
# I use range() instead of xrange() because I am using Python 3.4
现在......脚本就像永远一样运行.....我打开一个python shell并逐行输入脚本,这就是我得到的消息:
> Process SpawnPoolWorker-1: > Process SpawnPoolWorker-2: > Traceback (most recent call last): > Process SpawnPoolWorker-3: > Traceback (most recent call last): > Process SpawnPoolWorker-4: > Traceback (most recent call last): > Traceback (most recent call last): > File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap self.run() > File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap self.run() > File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap self.run() > File "E:\Anaconda3\lib\multiprocessing\process.py", line 254, in _bootstrap self.run() > File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) > File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) > File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) > File "E:\Anaconda3\lib\multiprocessing\process.py", line 93, in run self._target(*self._args, **self._kwargs) > File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker task = get() > File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker task = get() > File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker task = get() > File "E:\Anaconda3\lib\multiprocessing\pool.py", line 108, in worker task = get() > File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get return ForkingPickler.loads(res) > File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get return ForkingPickler.loads(res) > AttributeError: Can't get attribute 'f' on > AttributeError: Can't get attribute 'f' on File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get return ForkingPickler.loads(res) > AttributeError: Can't get attribute 'f' on File "E:\Anaconda3\lib\multiprocessing\queues.py", line 357, in get return ForkingPickler.loads(res) > AttributeError: Can't get attribute 'f' on
更新:我根据其他一些示例代码编辑了我的示例代码:
from multiprocessing import Pool
import numpy as np
def f(x):
return 2*x
if __name__ == '__main__':
data = np.array([1,2,3])
with Pool(5) as p:
result = p.map(f, (data for i in range(300)))
它现在有效。我现在需要做的是现在用更复杂的算法填写它。
我脑海中的另一个问题是:为什么这个代码可以工作,而以前的版本不能?
答案 0 :(得分:0)
包裹处理可能会对您有所帮助。请查看here。
答案 1 :(得分:0)
你可以通过这种方式进行多处理:
Elements inputs = doc.select("input[type=text], input[type=password], input[type=email]");