我有一个独立的Spark群集,并尝试使用Python在其上运行TensorFlow On Spark。到目前为止,我刚刚尝试了非常简单的示例,但我总是遇到同样的问题:每个工作程序都崩溃并出现相同的错误消息:
AttributeError: Can't pickle local object 'start.<locals>.<lambda>'
分配了一名新工作人员,我最终陷入无限Waiting for reservations...
循环。我的程序中没有明确的酸洗,所以我猜它必须是TensorFlow On Spark管道的一部分。没有Spark包装器,我的TensorFlow应用程序运行正常。我已经在Windows 7和CentOS Linux 7上测试过这两种行为。
(差不多)完整错误输出如下:
17/08/15 16:40:36 ERROR Executor: Exception in task 0.2 in stage 0.0 (TID 5)
org.apache.spark.api.python.PythonException: Traceback (most recent call last):
File "D:\Spark\python\lib\pyspark.zip\pyspark\worker.py", line 177, in main
File "D:\Spark\python\lib\pyspark.zip\pyspark\worker.py", line 172, in process
File "C:\Program Files\Anaconda3\lib\site-packages\pyspark\rdd.py", line 2423, in pipeline_func
return func(split, prev_func(split, iterator))
File "C:\Program Files\Anaconda3\lib\site-packages\pyspark\rdd.py", line 2423, in pipeline_func
return func(split, prev_func(split, iterator))
File "C:\Program Files\Anaconda3\lib\site-packages\pyspark\rdd.py", line 2423, in pipeline_func
return func(split, prev_func(split, iterator))
File "C:\Program Files\Anaconda3\lib\site-packages\pyspark\rdd.py", line 346, in func
return f(iterator)
File "C:\Program Files\Anaconda3\lib\site-packages\pyspark\rdd.py", line 794, in func
r = f(it)
File "C:\Program Files\Anaconda3\lib\site-packages\tensorflowonspark\TFSparkNode.py", line 290, in _mapfn
TFSparkNode.mgr = TFManager.start(authkey, ['control'], 'remote')
File "C:\Program Files\Anaconda3\lib\site-packages\tensorflowonspark\TFManager.py", line 41, in start
mgr.start()
File "C:\Program Files\Anaconda3\lib\multiprocessing\managers.py", line 513, in start
self._process.start()
File "C:\Program Files\Anaconda3\lib\multiprocessing\process.py", line 105, in start
self._popen = self._Popen(self)
File "C:\Program Files\Anaconda3\lib\multiprocessing\context.py", line 322, in _Popen
return Popen(process_obj)
File "C:\Program Files\Anaconda3\lib\multiprocessing\popen_spawn_win32.py", line 65, in __init__
reduction.dump(process_obj, to_child)
File "C:\Program Files\Anaconda3\lib\multiprocessing\reduction.py", line 60, in dump
ForkingPickler(file, protocol).dump(obj)
AttributeError: Can't pickle local object 'start.<locals>.<lambda>'
at org.apache.spark.api.python.PythonRunner$$anon$1.read(PythonRDD.scala:193)
at org.apache.spark.api.python.PythonRunner$$anon$1.<init>(PythonRDD.scala:234)
at org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:152)
at org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
at java.lang.Thread.run(Unknown Source)
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 105, in spawn_main
exitcode = _main(fd)
File "C:\Program Files\Anaconda3\lib\multiprocessing\spawn.py", line 115, in _main
self = reduction.pickle.load(from_parent)
似乎这个问题是known,但没有解决方案。任何提示都会受到赞赏,因为我没有想法。