将IPython并行集群对象传递到自定义类以进行批处理执行

时间:2014-10-10 00:35:34

标签: python parallel-processing batch-processing ipython-parallel

我是一名尝试使用python进行科学编程的新手程序员。我认为这些帖子(How to work with interactively-defined classes in IPython.parallel?ipython parallel push custom object)触及了类似的问题但对我没用。我想将我的代码作为脚本运行(对于PBS或SGE排队的调度程序),我不知道如何使用dill。

基本上,我正在尝试使用Ipython并行集群来拆分在自定义类方法中定义的计算。

我想将一个集群对象传递给我的自定义类实例,然后使用集群拆分对定义为成员的数据进行操作的计算。

  1. 使用ipcluster/path/to/ipcontroller-client.json),
  2. 启动群集
  3. 然后,我想跑,python test_parallel.py
  4. 其中,test_parallel.py

  5. class Foo(object):
        def __init__(self):
            from numpy import arange
            self.data = arange(10)*10
    
        def A(self, y):
            print "in A:", y
            self.data[y]
    
        def parallelA(self, z, cl):
            print "in parallelA:", cl[:].map_sync(self.A, z)
    
        def serialA(self, z):
            print "in serialA:", map(self.A, z)
    
    if __name__ == "__main__":
    
        from IPython.parallel import Client
        f = '/path/to/security/ipcontroller-client.json'
        c = Client(f)
    
        asdf = Foo()
        asdf.serialA([1, 3, 5])      ## works
        asdf.parallelA([1, 3, 5], c) ## doesn't work
    

    输出


    $ ~/Projects/parcellation$ python test_parallel.py 
    in serialA: in A: 1
    in A: 3
    in A: 5
    [None, None, None]
    in parallelA:
    Traceback (most recent call last):
      File "test_parallel.py", line 24, in <module>
        asdf.parallelA([1, 3, 5], c) ## doesn't work
      File "test_parallel.py", line 11, in parallelA
        print "in parallelA:", cl[:].map_sync(self.A, z)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 366, in map_sync
        return self.map(f,*sequences,**kwargs)
      File "<string>", line 2, in map
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 66, in sync_results
        ret = f(self, *args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 624, in map
        return pf.map(*sequences)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/remotefunction.py", line 271, in map
        ret = self(*sequences)
      File "<string>", line 2, in __call__
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/remotefunction.py", line 78, in sync_view_results
        return f(self, *args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/remotefunction.py", line 243, in __call__
        ar = view.apply(f, *args)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 233, in apply
        return self._really_apply(f, args, kwargs)
      File "<string>", line 2, in _really_apply
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 66, in sync_results
        ret = f(self, *args, **kwargs)
      File "<string>", line 2, in _really_apply
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 51, in save_ids
        ret = f(self, *args, **kwargs)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/view.py", line 567, in _really_apply
        ident=ident)
      File "/usr/local/lib/python2.7/dist-packages/IPython/parallel/client/client.py", line 1263, in send_apply_request
        item_threshold=self.session.item_threshold,
      File "/usr/local/lib/python2.7/dist-packages/IPython/kernel/zmq/serialize.py", line 145, in pack_apply_message
        arg_bufs = flatten(serialize_object(arg, buffer_threshold, item_threshold) for arg in args)
      File "/usr/local/lib/python2.7/dist-packages/IPython/utils/data.py", line 30, in flatten
        return [x for subseq in seq for x in subseq]
      File "/usr/local/lib/python2.7/dist-packages/IPython/kernel/zmq/serialize.py", line 145, in <genexpr>
        arg_bufs = flatten(serialize_object(arg, buffer_threshold, item_threshold) for arg in args)
      File "/usr/local/lib/python2.7/dist-packages/IPython/kernel/zmq/serialize.py", line 89, in serialize_object
        buffers.insert(0, pickle.dumps(cobj, PICKLE_PROTOCOL))
    cPickle.PicklingError: Can't pickle <type 'instancemethod'>: attribute lookup __builtin__.instancemethod failed
    

    帮助理解为什么这不起作用,并且需要最少代码更改的修复将非常有用。

    谢谢!

1 个答案:

答案 0 :(得分:1)

我想出了一个解决方案:

class Foo(object):
    def __init__(self):
        from numpy import arange
        self.data = arange(10)*10

    @staticmethod
    def A(data, y):
        print "in A:", y ## doesn't produce an output
        return data[y]

    def parallelA(self, z, cl):
        print "in parallelA:", cl[:].map_sync(self.A, [self.data]*len(z), z)

if __name__ == "__main__":

    from IPython.parallel import Client
    f = '/path/to/security/ipcontroller-client.json'
    c = Client(f)

    asdf = Foo()
    asdf.parallelA([1, 3, 5], c)

运行上述代码时的输出:

$ python test_parallel.py 
in parallelA: [10, 30, 50]