我尝试在EMR上从Spark 1.6升级到Spark 2.0。 (群集模式)
我在运行工作负载时遇到以下错误:
线程中的异常" main" java.lang.reflect.UndeclaredThrowableException at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1672) 在 org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70) 在 org.apache.spark.executor.CoarseGrainedExecutorBackend $ .RUN(CoarseGrainedExecutorBackend.scala:174) 在 org.apache.spark.executor.CoarseGrainedExecutorBackend $。主要(CoarseGrainedExecutorBackend.scala:270) 在 org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala) 引起:org.apache.spark.SparkException:抛出异常 awaitResult at org.apache.spark.rpc.RpcTimeout $$ anonfun $ 1.applyOrElse(RpcTimeout.scala:77) 在 org.apache.spark.rpc.RpcTimeout $$ anonfun $ 1.applyOrElse(RpcTimeout.scala:75) 在 scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:36) 在 org.apache.spark.rpc.RpcTimeout $$ anonfun $ addMessageIfTimeout $ 1.applyOrElse(RpcTimeout.scala:59) 在 org.apache.spark.rpc.RpcTimeout $$ anonfun $ addMessageIfTimeout $ 1.applyOrElse(RpcTimeout.scala:59) 在scala.PartialFunction $ OrElse.apply(PartialFunction.scala:167)at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:83)at at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:88)at at org.apache.spark.executor.CoarseGrainedExecutorBackend $$ anonfun $运行$ 1.适用$ MCV $ SP(CoarseGrainedExecutorBackend.scala:188) 在 org.apache.spark.deploy.SparkHadoopUtil $$匿名$ 1.run(SparkHadoopUtil.scala:71) 在 org.apache.spark.deploy.SparkHadoopUtil $$匿名$ 1.run(SparkHadoopUtil.scala:70) 在java.security.AccessController.doPrivileged(Native Method)at javax.security.auth.Subject.doAs(Subject.java:422)at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) ... 4更多引起:java.io.IOException:无法连接 /172.x.x.x:33190 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:228) 在 org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:179) 在 org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:197) 在org.apache.spark.rpc.netty.Outbox $$ anon $ 1.call(Outbox.scala:191) 在org.apache.spark.rpc.netty.Outbox $$ anon $ 1.call(Outbox.scala:187) 在java.util.concurrent.FutureTask.run(FutureTask.java:266)at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)引起: java.net.ConnectException:拒绝连接:/172.31.32.131:33190 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717) 在 io.netty.channel.socket.nio.NioSocketChannel.doFinishConnect(NioSocketChannel.java:224) 在 io.netty.channel.nio.AbstractNioChannel $ AbstractNioUnsafe.finishConnect(AbstractNioChannel.java:289) 在 io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:528) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468) 在 io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382) 在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)at io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:111) ......还有1个
请注意,下面提到的IP是工作机器的IP,执行' sortByKey'步骤
我已经证实我能够从师父到工人,反之亦然。
更新
更多信息,Yarn节点管理器日志:
堆栈跟踪:ExitCodeException exitCode = 1:2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.util.Shell.runCommand(Shell.java:545)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.util.Shell.run(Shell.java:456)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.util.Shell $ ShellCommandExecutor.execut e(Shell.java:722)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.yarn.server.nodemanager.DefaultContain erExecutor.launchContainer(DefaultContainerExecutor.java:212) 2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.yarn.server.nodemanager.containermanag er.launcher.ContainerLaunch.call(ContainerLaunch.java:302)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at org.apache.hadoop.yarn.server.nodemanager.containermanag er.launcher.ContainerLaunch.call(ContainerLaunch.java:82)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at java.util.concurrent.FutureTask.run(FutureTask.java:266)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at java.util.concurrent.ThreadPoolExecutor.runWorker(线程 PoolExecutor.java:1142)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(Threa dPoolExecutor.java:617)2016-10-17 06:27:43,468 INFO org.apache.hadoop.yarn.server.nodemanager.ContainerExecutor (ContainersLauncher#66):at java.lang.Thread.run(Thread.java:745)
我猜这是因为工作者中的连接被拒绝错误。另外,我不太确定为什么工作人员试图用IP而不是127.0.0.1连接自己。