MLLib / Spark

时间:2016-08-14 09:57:37

标签: java networking apache-spark apache-spark-mllib

当我为文件运行MLlib时>我们的集群中有1个分区我得到以下异常:

  

16/08/14 12:43:23 WARN TaskSetManager:阶段2.1中丢失的任务2.0(TID   49,da06.qcri.org):FetchFailed(BlockManagerId(3,da08.qcri.org,   33322),shuffleId = 0,mapId = 5,reduceId = 2,message =   org.apache.spark.shuffle.FetchFailedException:无法连接   da08.qcri.org:33322 at   org.apache.spark.storage.ShuffleBlockFetcherIterator.throwFetchFailedException(ShuffleBlockFetcherIterator.scala:323)     在   org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:300)     在   org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:51)     在scala.collection.Iterator $$ anon $ 11.next(Iterator.scala:328)at   scala.collection.Iterator $$ anon $ 13.hasNext(Iterator.scala:371)at   scala.collection.Iterator $$ anon $ 11.hasNext(Iterator.scala:327)at   org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:32)     在   org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)     在   org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:152)     在   org.apache.spark.Aggregator.combineCombinersByKey(Aggregator.scala:58)     在   org.apache.spark.shuffle.BlockStoreShuffleReader.read(BlockStoreShuffleReader.scala:83)     在org.apache.spark.rdd.ShuffledRDD.compute(ShuffledRDD.scala:98)at at   org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)at at at   org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at   org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)     在org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)     在org.apache.spark.rdd.RDD.iterator(RDD.scala:270)at   org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)at at   org.apache.spark.scheduler.Task.run(Task.scala:89)at   org.apache.spark.executor.Executor $ TaskRunner.run(Executor.scala:227)     在   java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)     在   java.util.concurrent.ThreadPoolExecutor中的$ Worker.run(ThreadPoolExecutor.java:617)     在java.lang.Thread.run(Thread.java:745)

     

引起:   java.io.IOException:无法连接到***。org:33322 at   org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:216)     在   org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)     在   org.apache.spark.network.netty.NettyBlockTransferService $$匿名$ 1.createAndStart(NettyBlockTransferService.scala:90)     在   org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)     在   org.apache.spark.network.shuffle.RetryingBlockFetcher.access $ 200(RetryingBlockFetcher.java:43)     在   org.apache.spark.network.shuffle.RetryingBlockFetcher $ 1.run(RetryingBlockFetcher.java:170)     在   java.util.concurrent.Executors $ RunnableAdapter.call(Executors.java:511)     在java.util.concurrent.FutureTask.run(FutureTask.java:266)... 3   更多

     

引起:java.nio.channels.UnresolvedAddressException at   sun.nio.ch.Net.checkAddress(Net.java:123)at   sun.nio.ch.SocketChannelImpl.connect(SocketChannelImpl.java:621)at at   io.netty.channel.socket.nio.NioSocketChannel.doConnect(NioSocketChannel.java:209)     在   io.netty.channel.nio.AbstractNioChannel $ AbstractNioUnsafe.connect(AbstractNioChannel.java:207)     在   io.netty.channel.DefaultChannelPipeline $ HeadContext.connect(DefaultChannelPipeline.java:1097)     在   io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)     在   io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)     在   io.netty.channel.ChannelOutboundHandlerAdapter.connect(ChannelOutboundHandlerAdapter.java:47)     在   io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)     在   io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)     在   io.netty.channel.ChannelDuplexHandler.connect(ChannelDuplexHandler.java:50)     在   io.netty.channel.AbstractChannelHandlerContext.invokeConnect(AbstractChannelHandlerContext.java:471)     在   io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:456)     在   io.netty.channel.AbstractChannelHandlerContext.connect(AbstractChannelHandlerContext.java:438)     在   io.netty.channel.DefaultChannelPipeline.connect(DefaultChannelPipeline.java:908)     在io.netty.channel.AbstractChannel.connect(AbstractChannel.java:203)     在io.netty.bootstrap.Bootstrap $ 2.run(Bootstrap.java:166)at at   io.netty.util.concurrent.SingleThreadEventExecutor.runAllTask​​s(SingleThreadEventExecutor.java:357)     在io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:357)at   io.netty.util.concurrent.SingleThreadEventExecutor $ 2.run(SingleThreadEventExecutor.java:111)     ......还有1个

在从属配置文件中,我有IP节点,而不是主机名。另外,当我使用主机名从主节点ping机器时,它似乎没有任何问题。

任何人都有类似问题或对如何解决问题有所了解?

0 个答案:

没有答案