使用spark-submit和yarn作为集群的master时的java.io.EOFException

时间:2018-04-06 09:41:57

标签: apache-spark hadoop yarn httpserver eofexception

我正在尝试使用这个spark-submit命令运行一个jar文件:

spark-submit --master yarn --deploy-mode cluster --executor-memory 3g --class my.package.Main my-jar-file.jar

Main类是jar的主类,这里是内容(全部在Scala中):

object Main{
    def main(args: Array[String]){
        val server = HttpServer.create(new InetSocketAddress("master", 8000), 0)
        val backend = new MainProcess()
        val handlerRoot = new RootHandler()
        handlerRoot.initProcess(backend)

        server.createContext("/", handlerRoot)
        server.setExecutor(null)

        server.start()
        println("Server is started at " + server.getAddress().getHostString() + ":" + server.getAddress().getPort())
    }
}

类MainProcess是使用从HDFS获取的文件使用Spark和Spark GraphX库来完成工作的类。这是我在MainProcess类中配置SparkContext的方法:

class MainProcess{
    val config = new SparkConf()
    config.setAppName("Final GraphX App - Main")
    val sc = new SparkContext(config)
   ...
}

该应用程序似乎运行正常,最终状态返回成功,但应用程序只是关闭而不是连续运行,因为它应该是一个正在运行的服务器。我只能打开链接主机:8000一次,当我尝试刷新页面时它又无法连接。这是运行应用程序的日志:

18/04/06 15:45:59 ERROR yarn.YarnAllocator: Failed to launch executor 2 on container container_1522920902032_0027_01_000003
org.apache.spark.SparkException: Exception while starting container container_1522920902032_0027_01_000003 on host slave2
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:125)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.run(ExecutorRunnable.scala:65)
    at org.apache.spark.deploy.yarn.YarnAllocator$$anonfun$runAllocatedContainers$1$$anon$1.run(YarnAllocator.scala:523)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
    at java.lang.Thread.run(Thread.java:748)
Caused by: java.io.IOException: Failed on local exception: java.io.IOException: java.io.EOFException; Host Details : local host is: "master/10.100.69.207"; destination host is: "slave2":57914; 
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:776)
    at org.apache.hadoop.ipc.Client.call(Client.java:1479)
    at org.apache.hadoop.ipc.Client.call(Client.java:1412)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
    at com.sun.proxy.$Proxy19.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.api.impl.pb.client.ContainerManagementProtocolPBClientImpl.startContainers(ContainerManagementProtocolPBClientImpl.java:96)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:191)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:102)
    at com.sun.proxy.$Proxy20.startContainers(Unknown Source)
    at org.apache.hadoop.yarn.client.api.impl.NMClientImpl.startContainer(NMClientImpl.java:201)
    at org.apache.spark.deploy.yarn.ExecutorRunnable.startContainer(ExecutorRunnable.scala:122)
    ... 5 more
Caused by: java.io.IOException: java.io.EOFException
    at org.apache.hadoop.ipc.Client$Connection$1.run(Client.java:687)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Client$Connection.handleSaslConnectionFailure(Client.java:650)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:737)
    at org.apache.hadoop.ipc.Client$Connection.access$2900(Client.java:375)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1528)
    at org.apache.hadoop.ipc.Client.call(Client.java:1451)
    ... 18 more
Caused by: java.io.EOFException
    at java.io.DataInputStream.readInt(DataInputStream.java:392)
    at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:367)
    at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:560)
    at org.apache.hadoop.ipc.Client$Connection.access$1900(Client.java:375)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:729)
    at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:725)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:725)
    ... 21 more

这个应用程序基本上是一个使用Java HTTP Server(com.sun.net.httpserver.HttpServer)制作的Web应用程序,它使用Spark来处理大数据。发送的请求由处理程序类接受,并且创建一个新线程以在后台运行Spark作业。用户可以发送另一个请求以检查Spark作业是否完成,因此可以将完成的结果显示在网页上。问题是,每次Spark声称完成一项工作时服务器都被“杀死”(但在这种情况下,工作失败)。 我正在使用为Hadoop 2.7和Hadoop 2.7.1构建的Spark 2.2.0。所有数据文件都在HDFS中。

0 个答案:

没有答案