Spark - 2.0:拒绝连接

时间:2016-10-16 11:59:16

标签: apache-spark emr amazon-emr

我尝试在EMR上从Spark 1.6升级到Spark 2.0。 (群集模式)

我在运行工作负载时遇到以下错误:

  

线程中的异常" main"   java.lang.reflect.UndeclaredThrowableException at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672)     在   org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)     在   org.apache.spark.executor.CoarseGrainedExecutorBackend $ .RUN(CoarseGrainedExecutorBackend.scala:174)     在   org.apache.spark.executor.CoarseGrainedExecutorBackend $。主要(CoarseGrainedExecutorBackend.scala:270)     在   org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)   引起:org.apache.spark.SparkException:抛出异常   awaitResult at   org.apache.spark.rpc.RpcTimeout $$ anonfun $ 1.applyOrElse(RpcTimeout.scala:77)     在   org.apache.spark.rpc.RpcTimeout $$ anonfun $ 1.applyOrElse(RpcTimeout.scala:75)     在   scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36)     在   org.apache.spark.rpc.RpcTimeout $$ anonfun $ addMessageIfTimeout $ 1.applyOrElse(RpcTimeout.scala:59)     在   org.apache.spark.rpc.RpcTimeout $$ anonfun $ addMessageIfTimeout $ 1.applyOrElse(RpcTimeout.scala:59)     在scala.PartialFunction $ OrElse.apply(PartialFunction.scala:167)at   org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)at at   org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)at at   org.apache.spark.executor.CoarseGrainedExecutorBackend $$ anonfun $运行$ 1.适用$ MCV $ SP(CoarseGrainedExecutorBackend.scala:188)     在   org.apache.spark.deploy.SparkHadoopUtil $$匿名$ 1.run(SparkHadoopUtil.scala:71)     在   org.apache.spark.deploy.SparkHadoopUtil $$匿名$ 1.run(SparkHadoopUtil.scala:70)     在java.security.AccessController.doPrivileged(Native Method)at   javax.security.auth.Subject.doAs(Subject.java:422)at   org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)     ... 4更多引起:java.io.IOException:无法连接   /172.x.x.x:33190 at   org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228)     在   org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179)     在   org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197)     在org.apache.spark.rpc.netty.Outbox $$ anon $ 1.call(Outbox.scala:191)     在org.apache.spark.rpc.netty.Outbox $$ anon $ 1.call(Outbox.scala:187)     在java.util.concurrent.FutureTask.run(FutureTask.java:266)at   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)引起:   java.net.ConnectException:拒绝连接:/172.31.32.131:33190     at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at   sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)     在   io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224)     在   io.netty.channel.nio.AbstractNioChannel $ AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289)     在   io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528)     在   io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)     在   io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)     在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)at   io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:111)     ......还有1个

请注意,下面提到的IP是工作机器的IP,执行' sortByKey'步骤

我已经证实我能够从师父到工人,反之亦然。

更新

更多信息,Yarn节点管理器日志:

  

堆栈跟踪:ExitCodeException exitCode = 1:2016-10-17 06:27:43,468   INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.util.Shell.runCommand(Shell.java:545)2016-10-17   06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.util.Shell.run(Shell.java:456)2016-10-17   06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.util.Shell $ ShellCommandExecutor.execut   e(Shell.java:722)2016-10-17 06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.yarn.server.nodemanager.DefaultContain   erExecutor.launchContainer(DefaultContainerExecutor.java:212)   2016-10-17 06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.yarn.server.nodemanager.containermanag   er.launcher.ContainerLaunch.call(ContainerLaunch.java:302)2016-10-17   06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   org.apache.hadoop.yarn.server.nodemanager.containermanag   er.launcher.ContainerLaunch.call(ContainerLaunch.java:82)2016-10-17   06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   java.util.concurrent.FutureTask.run(FutureTask.java:266)2016-10-17   06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   java.util.concurrent.ThreadPoolExecutor.runWorker(线程   PoolExecutor.java:1142)2016-10-17 06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(Threa   dPoolExecutor.java:617)2016-10-17 06:27:43,468 INFO   org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor   (ContainersLauncher#66):at   java.lang.Thread.run(Thread.java:745)

我猜这是因为工作者中的连接被拒绝错误。另外,我不太确定为什么工作人员试图用IP而不是127.0.0.1连接自己。

0 个答案:

没有答案