Py4JJavaError:发生错误.spark.python.PythonRDD.collectAndServe作业已中止

时间:2018-05-19 08:06:37

标签: apache-spark pyspark rdd windows-7-x64

rdd_data = sc.parallelize([ list(r)[2:-1] for r in data.itertuples()])  
rdd_data.count()

使用独立群集我面临以下错误。 windows 7 python 3.6

给我错误:

  

〜\ Anaconda2 \ envs \ py36 \ lib \ site-packages \ py4j \ protocol.py in   get_return_value(answer,gateway_client,target_id,name)       318提出Py4JJavaError(       319"调用{0} {1} {2}时发生错误。\ n"。    - > 320格式(target_id,"。",名称),值)       321其他:       322提出Py4JError(

     

Py4JJavaError:调用时发生错误   Z:org.apache.spark.api.python.PythonRDD.collectAndServe。 :   org.apache.spark.SparkException:作业因阶段失败而中止:   阶段0.0中的任务0失败1次,最近失败:丢失任务0.0   在阶段0.0(TID 0,localhost,执行程序驱动程序):   org.apache.spark.SparkException:Python worker没有重新连接   时间在   org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:138)     在   org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)     在org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)     在   org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)     在org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)at   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at at   org.apache.spark.scheduler.Task.run(Task.scala:108)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:338)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     在java.lang.Thread.run(Thread.java:748)引起:   java.net.SocketTimeoutException:接受超时时间   java.net.DualStackPlainSocketImpl.waitForNewConnection(本机方法)     在   java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:135)     在   java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)     在java.net.PlainSocketImpl.accept(PlainSocketImpl.java:199)at   java.net.ServerSocket.implAccept(ServerSocket.java:545)at   java.net.ServerSocket.accept(ServerSocket.java:513)at   org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)     ......还有12个

     

驱动程序堆栈跟踪:at   org.apache.spark.scheduler.DAGScheduler.org $阿帕奇$火花$ $调度$$ DAGScheduler failJobAndIndependentStages(DAGScheduler.scala:1517)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1505)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ abortStage $ 1.适用(DAGScheduler.scala:1504)     在   scala.collection.mutable.ResizableArray $ class.foreach(ResizableArray.scala:59)     在scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1504)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:814)     在   org.apache.spark.scheduler.DAGScheduler $$ anonfun $ handleTaskSetFailed $ 1.适用(DAGScheduler.scala:814)     在scala.Option.foreach(Option.scala:257)at   org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:814)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:1732)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1687)     在   org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1676)     在org.apache.spark.util.EventLoop $$ anon $ 1.run(EventLoop.scala:48)     在   org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:630)     在org.apache.spark.SparkContext.runJob(SparkContext.scala:2029)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2050)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2069)at at   org.apache.spark.SparkContext.runJob(SparkContext.scala:2094)at at   org.apache.spark.rdd.RDD $$ anonfun $ collect $ 1.apply(RDD.scala:936)at at   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:151)     在   org.apache.spark.rdd.RDDOperationScope $ .withScope(RDDOperationScope.scala:112)     在org.apache.spark.rdd.RDD.withScope(RDD.scala:362)at   org.apache.spark.rdd.RDD.collect(RDD.scala:935)at   org.apache.spark.api.python.PythonRDD $ .collectAndServe(PythonRDD.scala:467)     在   org.apache.spark.api.python.PythonRDD.collectAndServe(PythonRDD.scala)     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at   sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)     在   sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)     在java.lang.reflect.Method.invoke(Method.java:498)at   py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)at at   py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357)at   py4j.Gateway.invoke(Gateway.java:280)at   py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)     在py4j.commands.CallCommand.execute(CallCommand.java:79)at   py4j.GatewayConnection.run(GatewayConnection.java:214)at   java.lang.Thread.run(Thread.java:748)引起:   org.apache.spark.SparkException:Python worker没有重新连接   时间在   org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:138)     在   org.apache.spark.api.python.PythonWorkerFactory.create(PythonWorkerFactory.scala:67)     在org.apache.spark.SparkEnv.createPythonWorker(SparkEnv.scala:117)     在   org.apache.spark.api.python.PythonRunner.compute(PythonRDD.scala:128)     在org.apache.spark.api.python.PythonRDD.compute(PythonRDD.scala:63)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:287)at   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)at at   org.apache.spark.scheduler.Task.run(Task.scala:108)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:338)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:624)     ... 1更多引起:java.net.SocketTimeoutException:接受定时   out at java.net.DualStackPlainSocketImpl.waitForNewConnection(Native   方法)at   java.net.DualStackPlainSocketImpl.socketAccept(DualStackPlainSocketImpl.java:135)     在   java.net.AbstractPlainSocketImpl.accept(AbstractPlainSocketImpl.java:409)     在java.net.PlainSocketImpl.accept(PlainSocketImpl.java:199)at   java.net.ServerSocket.implAccept(ServerSocket.java:545)at   java.net.ServerSocket.accept(ServerSocket.java:513)at   org.apache.spark.api.python.PythonWorkerFactory.createSimpleWorker(PythonWorkerFactory.scala:133)     ......还有12个

0 个答案:

没有答案