Question

我试图并行化（以某种简单的方式）我最初使用Shogun机器学习工具箱的机器学习代码。训练有许多可能的配置，因此顺序处理不是一种合适的方法。我有一个名为mkl_object的学习机对象，其参数将根据我编程的路径生成器生成的网格参数path列表（paths）进行更新，称为{{1} }。我想要一个多处理设置，以便gridObj.generateRandomGridPaths()学习每个路径的模型。也就是说，例如，对应于三个路径的列表的三个模型：mkl_object，将分别在分离的核心中学习三个模型。请参阅下面的代码及其错误输出：

paths = [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]]

见下面的输出错误：

from multiprocessing import Pool
#from functools import partial # I already tried with partial and parmap
#import parmap as par
# My Machine learning and random grid search modules:
from mklObj import *
from gridObj import *
# The input training and test data subsets are ShogunFeature objects
[feats_train,
feats_test,
labelsTr,
labelsTs] = load_multiclassToy('../shogun-data/toy/',# Directory
         'train_multiclass.dat',# Sample dataSet file name
         'label_multiclass.dat')# Multi-class Labels file name

mkl_object = mklObj() # Learning machine global instantiation
#Function for mapping:    
def mkPool(path): # path: a list of learning parameters
    global feats_train # Train and test data produced above
    global labelsTr
    global feats_test
    global labelsTs

    global mkl_object 

    if path[0][0] is 'gaussian':
        a = 2*path[0][1][0]**2
        b = 2*path[0][1][1]**2
    else:
        a = path[0][1][0]
        b = path[0][1][1]
    # Setting each listelement (paths[i]) as learning parameter:
    mkl_object.mklC = path[5]
    mkl_object.weightRegNorm = path[4]
    mkl_object.fit_kernel(featsTr=feats_train,
                   targetsTr=labelsTr,
                   featsTs=feats_test,
                   targetsTs= labelsTs,
                   kernelFamily=path[0][0],
                   randomRange=[a, b],            
                   randomParams=[(a + b)/2, 1.0],  
                   hyper=path[3],       
                   pKers=path[2])
    # Returns the test error:
    return mkl_object.testerr

if __name__ == '__main__':

    p = Pool(3)
#### Loading the experimentation grid of parameters.
    grid = gridObj(file = 'gridParameterDic.txt')
    paths = grid.generateRandomGridPaths(trials = 3)
    print 'See the path list: ', paths
    [a, b, c] = paths
    # I already made tests with passing 'paths' and '[paths]' and the error is the same.
    print p.map(mkPool, [a, b, c])

不应发生上述costum异常，因为/usr/bin/python2.7 /home/.../mklCall.py See the path list: [[('wave', [0.5, 100]), '2-gr', 7, 'weibull', 1.0, 0.4], [('spherical', [0.5, 50]), '2-gr_tfidf', 26, 'linear', 5.0, 20.0], [('exponential', [0.5, 50]), '3-gr_tfidf', 22, 'triangular', 1.5, 1.3]] Traceback (most recent call last): The entered hyperparameter distribution is not allowed: weibull File "../mklCall.py", line 76, in <module> The entered hyperparameter distribution is not allowed: linear print p.map(mkPool, [a, b, c]) The entered hyperparameter distribution is not allowed: triangular File "/usr/lib/python2.7/multiprocessing/pool.py", line 251, in map return self.map_async(func, iterable, chunksize).get() File "/usr/lib/python2.7/multiprocessing/pool.py", line 558, in get raise self._value TypeError: 'NoneType' object is not iterable Process finished with exit code 1（和其他出现的）是有效的字符串（输入参数）。因此，在执行时似乎存在未知的起源障碍。此错误重复为weibull。

如果我针对单个路径运行培训，而不使用len(paths)，则没有错误。

我还为某些路径以线性形式运行代码，并且没有错误：

Pool.map()

我遵循了python文档https://docs.python.org/2/library/multiprocessing.html。非常感谢建议，示例或可能的解决方案。

提前谢谢。

python多处理列表和Shogun的混合数据列表

0 个答案: