我正在尝试在我使用他们提供的Spark-ec2脚本创建的Spark集群上运行我的Spark作业。我可以运行SparkPi示例,但每当我运行我的工作时,我都会遇到此异常:
Exception in thread "main" java.io.IOException: Call to ec2-XXXXXXXXXX.compute-1.amazonaws.com/10.XXX.YYY.ZZZZ:9000 failed on local exception: java.io.EOFException
at org.apache.hadoop.ipc.Client.wrapException(Client.java:1107)
at org.apache.hadoop.ipc.Client.call(Client.java:1075)
at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:225)
at com.sun.proxy.$Proxy6.setPermission(Unknown Source)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)
at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)
at com.sun.proxy.$Proxy6.setPermission(Unknown Source)
at org.apache.hadoop.hdfs.DFSClient.setPermission(DFSClient.java:1042)
at org.apache.hadoop.hdfs.DistributedFileSystem.setPermission(DistributedFileSystem.java:531)
at org.apache.spark.util.FileLogger.createLogDir(FileLogger.scala:93)
at org.apache.spark.util.FileLogger.start(FileLogger.scala:70)
at org.apache.spark.scheduler.EventLoggingListener.start(EventLoggingListener.scala:71)
at org.apache.spark.SparkContext.<init>(SparkContext.scala:252)
at com.here.traffic.collection.archiver.IsoCcMergeJob$.isoMerge(IsoCcMergeJob.scala:55)
at com.here.traffic.collection.archiver.IsoCcMergeJob$.main(IsoCcMergeJob.scala:11)
at com.here.traffic.collection.archiver.IsoCcMergeJob.main(IsoCcMergeJob.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at org.apache.spark.deploy.SparkSubmit$.launch(SparkSubmit.scala:292)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:55)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392)
at org.apache.hadoop.ipc.Client$Connection.receiveResponse(Client.java:804)
at org.apache.hadoop.ipc.Client$Connection.run(Client.java:749)
从我在互联网上寻找解决方案看起来,它看起来可能与Hadoop lib版本不匹配,但我确认Spark使用1.0.4并且我的作业是使用相同版本编译的。
为了提供更多上下文,我的工作是在两个文件的左外连接中生成S3,并将结果再次放入S3。
任何想法可能出错?
答案 0 :(得分:1)
我有使用ec2脚本的类似经验,一旦我们使用cloudera发行版(5.1)进行集群(通过一个简单的apt-get)和jar依赖项,几乎所有的版本问题就消失了。
添加spark作为依赖项(搜索文本“spark”):