Spark action stuck with EOFException

时间:2019-03-06 11:33:13

标签: apache-spark

I'm trying to execute an action with Spark with gets stuck. The corresponding executor throws following exception:

 2019-03-06 11:18:16 ERROR Inbox:91 - Ignoring error
java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:197)
at java.io.DataInputStream.readUTF(DataInputStream.java:609)
at java.io.DataInputStream.readUTF(DataInputStream.java:564)
at org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:131)
at org.apache.spark.scheduler.TaskDescription$$anonfun$decode$1.apply(TaskDescription.scala:130)
at scala.collection.immutable.Range.foreach(Range.scala:160)
at org.apache.spark.scheduler.TaskDescription$.decode(TaskDescription.scala:130)
at org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$receive$1.applyOrElse(CoarseGrainedExecutorBackend.scala:96)
at org.apache.spark.rpc.netty.Inbox$$anonfun$process$1.apply$mcV$sp(Inbox.scala:117)
at org.apache.spark.rpc.netty.Inbox.safelyCall(Inbox.scala:205)
at org.apache.spark.rpc.netty.Inbox.process(Inbox.scala:101)
at org.apache.spark.rpc.netty.Dispatcher$MessageLoop.run(Dispatcher.scala:221)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)

My environment is a standalone Spark cluster on Docker with Zeppelin as Spark driver. The connection to the cluster is working fine.

My Spark action is a simple output of a database read like:

spark.read.jdbc(jdbcString, "table", props).show()

I can print the schema of the table, so there shouldn't be a problem with the connection.

3 个答案:

答案 0 :(得分:0)

请检查您的环境JAVA,Python,Pysaprk在MASTER和WORKER中的路径和版本必须相同

答案 1 :(得分:0)

与spark独立集群相比,我们的驱动程序机器具有不同版本的Java。当我们尝试使用具有相同Java版本的另一台计算机时,它可以工作。

答案 2 :(得分:0)

我在S3上可用的文件夹之一中遇到了相同的问题。数据以Snappy压缩存储为Parquet。当我通过Snappy压缩将其更改为ORC时,它就像魅力一样工作。