我试图并行化(以某种简单的方式)我最初使用Shogun机器学习工具箱的机器学习代码。训练有许多可能的配置,因此顺序处理不是一种合适的方法。我有一个名为mkl_object
的学习机对象,其参数将根据我编程的路径生成器生成的网格参数path
列表(paths
)进行更新,称为{{1} }。我想要一个多处理设置,以便gridObj.generateRandomGridPaths()
学习每个路径的模型。也就是说,例如,对应于三个路径的列表的三个模型:mkl_object
,将分别在分离的核心中学习三个模型。请参阅下面的代码及其错误输出:
paths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
见下面的输出错误:
from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
'train_multiclass.dat',# Sample dataSet file name
'label_multiclass.dat')# Multi-class Labels file name
mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:
def mkPool(path): # path: a list of learning parameters
global feats_train # Train and test data produced above
global labelsTr
global feats_test
global labelsTs
global mkl_object
if path[0][0] is 'gaussian':
a = 2*path[0][1][0]**2
b = 2*path[0][1][1]**2
else:
a = path[0][1][0]
b = path[0][1][1]
# Setting each listelement (paths[i]) as learning parameter:
mkl_object.mklC = path[5]
mkl_object.weightRegNorm = path[4]
mkl_object.fit_kernel(featsTr=feats_train,
targetsTr=labelsTr,
featsTs=feats_test,
targetsTs= labelsTs,
kernelFamily=path[0][0],
randomRange=[a, b],
randomParams=[(a + b)/2, 1.0],
hyper=path[3],
pKers=path[2])
# Returns the test error:
return mkl_object.testerr
if __name__ == '__main__':
p = Pool(3)
#### Loading the experimentation grid of parameters.
grid = gridObj(file = 'gridParameterDic.txt')
paths = grid.generateRandomGridPaths(trials = 3)
print 'See the path list: ', paths
[a, b, c] = paths
# I already made tests with passing 'paths' and '[paths]' and the error is the same.
print p.map(mkPool, [a, b, c])
不应发生上述costum异常,因为/usr/bin/python2.7 /home/.../mklCall.py
See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]
Traceback (most recent call last):
The entered hyperparameter distribution is not allowed: weibull
File "../mklCall.py", line 76, in <module>
The entered hyperparameter distribution is not allowed: linear
print p.map(mkPool, [a, b, c])
The entered hyperparameter distribution is not allowed: triangular
File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map
return self.map_async(func, iterable, chunksize).get()
File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get
raise self._value
TypeError: 'NoneType' object is not iterable
Process finished with exit code 1
(和其他出现的)是有效的字符串(输入参数)。因此,在执行时似乎存在未知的起源障碍。此错误重复为weibull
。
如果我针对单个路径运行培训,而不使用len(paths)
,则没有错误。
我还为某些路径以线性形式运行代码,并且没有错误:
Pool.map()
我遵循了python文档https://docs.python.org/2/library/multiprocessing.html。 非常感谢建议,示例或可能的解决方案。
提前谢谢。