Spark无法序列化递归函数,给出了PicklingError

时间:2017-04-30 02:55:16

标签: apache-spark pyspark pickle

我正在编写一个包含递归函数的pyspark程序。当我执行这个程序时,会得到以下错误。

Traceback (most recent call last):
  File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1578, in <module>
    globals = debugger.run(setup['file'], None, None, is_module)
  File "/Applications/PyCharm CE.app/Contents/helpers/pydev/pydevd.py", line 1015, in run
    pydev_imports.execfile(file, globals, locals)  # execute the script
  File "/Users/Documents/repos/
  File "/Users/Documents/repos/main.py", line 62, in main
    run_by_date(dt.datetime.today() - dt.timedelta(days=1))
  File "/Users/Documents/repos/main.py", line 50, in run_by_date
    parsed_rdd.repartition(1).saveAsTextFile(save_path)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/rdd.py", line 2058, in repartition
    return self.coalesce(numPartitions, shuffle=True)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/rdd.py", line 2075, in coalesce
    jrdd = selfCopy._jrdd.coalesce(numPartitions, shuffle)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/rdd.py", line 2439, in _jrdd
    self._jrdd_deserializer, profiler)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/rdd.py", line 2372, in _wrap_function
    pickled_command, broadcast_vars, env, includes = _prepare_for_python_RDD(sc, command)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/rdd.py", line 2358, in _prepare_for_python_RDD
    pickled_command = ser.dumps(command)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/serializers.py", line 440, in dumps
    return cloudpickle.dumps(obj, 2)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/cloudpickle.py", line 667, in dumps
    cp.dump(obj)
  File "/Users/Documents/tools/spark-2.1.0/python/pyspark/cloudpickle.py", line 111, in dump
    raise pickle.PicklingError(msg)
pickle.PicklingError: Could not pickle object as excessively deep recursion required.

我理解这可能是由于递归在序列化时有很大的深度。那么如何处理这个问题呢。 非常感谢你。

0 个答案:

没有答案