cPickle.PicklingError:无法序列化对象:NotImplementedError

时间:2019-10-25 11:51:14

标签: python tensorflow keras pyspark elephas

运行Elephas示例而未进行修改时出错: (即使使用git version pip install也会出现该错误--no-cache-dir git + git://github.com/maxpumperla/elephas.git@master)

我使用的示例:https://github.com/maxpumperla/elephas/blob/master/examples/ml_pipeline_otto.py

(我试图启用tf.compat.v1.enable_eager_execution(),但其他代码不适用于该设置)

pyspark_1      | 19/10/25 10:23:03 INFO SparkContext: Created broadcast 12 from broadcast at NativeMethodAccessorImpl.java:0
pyspark_1      | Traceback (most recent call last):
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/serializers.py", line 590, in dumps
pyspark_1      |     return cloudpickle.dumps(obj, 2)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 863, in dumps
pyspark_1      |     cp.dump(obj)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 260, in dump
pyspark_1      |     return Pickler.dump(self, obj)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 224, in dump
pyspark_1      |     self.save(obj)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 568, in save_tuple
pyspark_1      |     save(element)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 406, in save_function
pyspark_1      |     self.save_function_tuple(obj)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 549, in save_function_tuple
pyspark_1      |     save(state)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 655, in save_dict
pyspark_1      |     self._batch_setitems(obj.iteritems())
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 687, in _batch_setitems
pyspark_1      |     save(v)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 606, in save_list
pyspark_1      |     self._batch_appends(iter(obj))
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 642, in _batch_appends
pyspark_1      |     save(tmp[0])
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/cloudpickle.py", line 660, in save_instancemethod
pyspark_1      |     obj=obj)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 401, in save_reduce
pyspark_1      |     save(args)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 554, in save_tuple
pyspark_1      |     save(element)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 331, in save
pyspark_1      |     self.save_reduce(obj=obj, *rv)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 425, in save_reduce
pyspark_1      |     save(state)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 655, in save_dict
pyspark_1      |     self._batch_setitems(obj.iteritems())
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 687, in _batch_setitems
pyspark_1      |     save(v)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 606, in save_list
pyspark_1      |     self._batch_appends(iter(obj))
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 642, in _batch_appends
pyspark_1      |     save(tmp[0])
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 331, in save
pyspark_1      |     self.save_reduce(obj=obj, *rv)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 425, in save_reduce
pyspark_1      |     save(state)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 286, in save
pyspark_1      |     f(self, obj) # Call unbound method with explicit self
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 655, in save_dict
pyspark_1      |     self._batch_setitems(obj.iteritems())
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 687, in _batch_setitems
pyspark_1      |     save(v)
pyspark_1      |   File "/usr/lib/python2.7/pickle.py", line 306, in save
pyspark_1      |     rv = reduce(self.proto)
pyspark_1      |   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 1152, in __reduce__
pyspark_1      |     initial_value=self.numpy(),
pyspark_1      |   File "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py", line 906, in numpy
pyspark_1      |     "numpy() is only available when eager execution is enabled.")
pyspark_1      | NotImplementedError: numpy() is only available when eager execution is enabled.
pyspark_1      | Traceback (most recent call last):
pyspark_1      |   File "/home/ubuntu/./spark.py", line 169, in <module>
pyspark_1      |     fitted_pipeline = pipeline.fit(train_df)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/ml/base.py", line 132, in fit
pyspark_1      |     return self._fit(dataset)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/ml/pipeline.py", line 109, in _fit
pyspark_1      |     model = stage.fit(dataset)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/ml/base.py", line 132, in fit
pyspark_1      |     return self._fit(dataset)
pyspark_1      |   File "/usr/local/lib/python2.7/dist-packages/elephas/ml_model.py", line 92, in _fit
pyspark_1      |     validation_split=self.get_validation_split())
pyspark_1      |   File "/usr/local/lib/python2.7/dist-packages/elephas/spark_model.py", line 151, in fit
pyspark_1      |     self._fit(rdd, epochs, batch_size, verbose, validation_split)
pyspark_1      |   File "/usr/local/lib/python2.7/dist-packages/elephas/spark_model.py", line 188, in _fit
pyspark_1      |     gradients = rdd.mapPartitions(worker.train).collect()
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/rdd.py", line 816, in collect
pyspark_1      |     sock_info = self.ctx._jvm.PythonRDD.collectAndServe(self._jrdd.rdd())
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/rdd.py", line 2532, in _jrdd
pyspark_1      |     self._jrdd_deserializer, profiler)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/rdd.py", line 2434, in _wrap_function
pyspark_1      |     pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/rdd.py", line 2420, in _prepare_for_python_RDD
pyspark_1      |     pickled_command = ser.dumps(command)
pyspark_1      |   File "/home/ubuntu/spark-2.4.4-bin-hadoop2.7/python/pyspark/serializers.py", line 600, in dumps
pyspark_1      |     raise pickle.PicklingError(msg)
pyspark_1      | cPickle.PicklingError: Could not serialize object: NotImplementedError: numpy() is only available when eager execution is enabled.

2 个答案:

答案 0 :(得分:1)

问题似乎是围绕this linespark_model.py的{​​{1}}中_fit使用RDD和SparkWorker-s的问题,然后才转换为TF的resource_variable_ops.py

gradients = rdd.mapPartitions(worker.train).collect()

对于每个多线程或使用其他抽象的数据结构,TF运行时都会被拦截,其中TF认为它在Eager中并调用Eager方法(.numpy()),但并非如此-因此出现错误。我非常怀疑这是否有“外部”解决方法,但是下面对TF源代码的编辑可以解决这个问题(如下)。

基本上,它的工作方式是强制执行急切和不急切操作的几乎每种可能的组合,以在图内和图外评估张量。


让我知道它是否有效。


# "/usr/local/lib/python2.7/dist-packages/tensorflow/python/ops/resource_variable_ops.py"
# line 1152
def __reduce__(self):
    # The implementation mirrors that of __deepcopy__.
    def K_eval(x, K):
        try:
            return K.get_value(K.to_dense(x))
        except:
            try:
                eval_fn = K.function([], [x])
                return eval_fn([])[0]
            except:
                return K.eval(x)
    try:
        import keras.backend as K
        initial_value = K_eval(self, K)
    except:
        import tensorflow.keras.backend as K
        initial_value = K_eval(self, K)

    return functools.partial(
        ResourceVariable,
        initial_value=initial_value,
        trainable=self.trainable,
        name=self._shared_name,
        dtype=self.dtype,
        constraint=self.constraint,
        distribute_strategy=self._distribute_strategy), ()

答案 1 :(得分:0)

出现此问题是因为 metrics (https://github.com/maxpumperla/elephas/blob/master/elephas/spark_model.py#L44) 是 MeanMetricWrapper (https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/keras/metrics.py#L599) 的列表,这导致了上面@OverLordGoldDragon 所说的内容。该问题已在最新的 Elephas 版本 (1.0.0) 中得到解决:https://github.com/danielenricocahall/elephas/releases/tag/1.0.0