当我为文件运行MLlib时>我们的集群中有1个分区我得到以下异常:
16/08/14 12:43:23 WARN TaskSetManager:阶段2.1中丢失的任务2.0(TID 49,da06.qcri.org):FetchFailed(BlockManagerId(3,da08.qcri.org, 33322),shuffleId = 0,mapId = 5,reduceId = 2,message = org.apache.spark.shuffle.FetchFailedException:无法连接 da08.qcri.org:33322 at org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323) 在 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300) 在 org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51) 在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)at scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)at scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)at org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32) 在 org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39) 在 org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:152) 在 org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:58) 在 org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83) 在org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)at at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at at at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38) 在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306) 在org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)at at org.apache.spark.scheduler.Task.run(Task.scala:89)at org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:227) 在 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) 在 java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617) 在java.lang.Thread.run(Thread.java:745)
引起: java.io.IOException:无法连接到***。org:33322 at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216) 在 org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167) 在 org.apache.spark.network.netty.NettyBlockTransferService $$匿名$ 1.createAndStart(NettyBlockTransferService.scala:90) 在 org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140) 在 org.apache.spark.network.shuffle.RetryingBlockFetcher.access $ 200(RetryingBlockFetcher.java:43) 在 org.apache.spark.network.shuffle.RetryingBlockFetcher $ 1.run(RetryingBlockFetcher.java:170) 在 java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511) 在java.util.concurrent.FutureTask.run(FutureTask.java:266)... 3 更多
引起:java.nio.channels.UnresolvedAddressException at sun.nio.ch.Net.checkAddress(Net.java:123)at sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)at at io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209) 在 io.netty.channel.nio.AbstractNioChannel $ AbstractNioUnsafe.connect(AbstractNioChannel.java:207) 在 io.netty.channel.DefaultChannelPipeline $ HeadContext.connect(DefaultChannelPipeline.java:1097) 在 io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) 在 io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) 在 io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47) 在 io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) 在 io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) 在 io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50) 在 io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471) 在 io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456) 在 io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438) 在 io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908) 在io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203) 在io.netty.bootstrap.Bootstrap $ 2.run(Bootstrap.java:166)at at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:357) 在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)at io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:111) ......还有1个
在从属配置文件中,我有IP节点,而不是主机名。另外,当我使用主机名从主节点ping机器时,它似乎没有任何问题。
任何人都有类似问题或对如何解决问题有所了解?