我有一个火花流媒体工作,从gnip读取推文流并将其写入Kafak。
Spark和kafka正在同一个集群上运行。
我的群集由5个节点组成。 Kafka-b01 ... Kafka-b05
Spark master正在Kafak-b05上运行。
以下是我们提交点火作业的方式
nohup sh $ SPZRK_HOME / bin / spark-submit --total-executor-cores 5 --class com.test.java.gnipStreaming.GnipSparkStreamer --master spark:// kafka-b05:7077 GnipStreamContainer。 jar powertrack kafka-b01,kafka-b02,kafka-b03,kafka-b04,kafka-b05 gnip_live_stream 2&
大约1小时后火花工作被杀死
nohub文件中的日志显示以下异常
org.apache.spark.storage.BlockFetchException: Failed to fetch block from 2 locations. Most recent failure cause:
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:595)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:585)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at org.apache.spark.storage.BlockManager.doGetRemote(BlockManager.scala:585)
at org.apache.spark.storage.BlockManager.getRemote(BlockManager.scala:570)
at org.apache.spark.storage.BlockManager.get(BlockManager.scala:630)
at org.apache.spark.rdd.BlockRDD.compute(BlockRDD.scala:48)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:306)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:270)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: io.netty.channel.ChannelException: Unable to create Channel from class class io.netty.channel.socket.nio.NioSocketChannel
at io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:455)
at io.netty.bootstrap.AbstractBootstrap.initAndRegister(AbstractBootstrap.java:306)
at io.netty.bootstrap.Bootstrap.doConnect(Bootstrap.java:134)
at io.netty.bootstrap.Bootstrap.connect(Bootstrap.java:116)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:211)
at org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:167)
at org.apache.spark.network.netty.NettyBlockTransferService$$anon$1.createAndStart(NettyBlockTransferService.scala:90)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.fetchAllOutstanding(RetryingBlockFetcher.java:140)
at org.apache.spark.network.shuffle.RetryingBlockFetcher.start(RetryingBlockFetcher.java:120)
at org.apache.spark.network.netty.NettyBlockTransferService.fetchBlocks(NettyBlockTransferService.scala:99)
at org.apache.spark.network.BlockTransferService.fetchBlockSync(BlockTransferService.scala:89)
at org.apache.spark.storage.BlockManager$$anonfun$doGetRemote$2.apply(BlockManager.scala:588)
... 15 more
Caused by: io.netty.channel.ChannelException: Failed to open a socket.
at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:62)
at io.netty.channel.socket.nio.NioSocketChannel.<init>(NioSocketChannel.java:72)
at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
at java.lang.Class.newInstance(Class.java:442)
at io.netty.bootstrap.AbstractBootstrap$BootstrapChannelFactory.newChannel(AbstractBootstrap.java:453)
... 26 more
Caused by: java.net.SocketException: Too many open files
at sun.nio.ch.Net.socket0(Native Method)
at sun.nio.ch.Net.socket(Net.java:411)
at sun.nio.ch.Net.socket(Net.java:404)
at sun.nio.ch.SocketChannelImpl.<init>(SocketChannelImpl.java:105)
at sun.nio.ch.SelectorProviderImpl.openSocketChannel(SelectorProviderImpl.java:60)
at io.netty.channel.socket.nio.NioSocketChannel.newSocket(NioSocketChannel.java:60)
... 33 more
我已将打开文件的最大数量增加到3275782(旧数字几乎是此数字的一半),但我仍面临同样的问题。
当我从spark web界面检查了工作人员的stderr
日志时,我发现了另一个例外。
java.nio.channels.ClosedChannelException
at kafka.network.BlockingChannel.send(BlockingChannel.scala:110)
at kafka.producer.SyncProducer.liftedTree1$1(SyncProducer.scala:75)
at kafka.producer.SyncProducer.kafka$producer$SyncProducer$$doSend(SyncProducer.scala:74)
at kafka.producer.SyncProducer.send(SyncProducer.scala:119)
at kafka.client.ClientUtils$.fetchTopicMetadata(ClientUtils.scala:59)
at kafka.producer.BrokerPartitionInfo.updateInfo(BrokerPartitionInfo.scala:82)
at kafka.producer.BrokerPartitionInfo.getBrokerPartitionInfo(BrokerPartitionInfo.scala:49)
at kafka.producer.async.DefaultEventHandler.kafka$producer$async$DefaultEventHandler$$getPartitionListForTopic(DefaultEventHandler.scala:188)
at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:152)
at kafka.producer.async.DefaultEventHandler$$anonfun$partitionAndCollate$1.apply(DefaultEventHandler.scala:151)
at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
at kafka.producer.async.DefaultEventHandler.partitionAndCollate(DefaultEventHandler.scala:151)
at kafka.producer.async.DefaultEventHandler.dispatchSerializedData(DefaultEventHandler.scala:96)
at kafka.producer.async.DefaultEventHandler.handle(DefaultEventHandler.scala:73)
at kafka.producer.Producer.send(Producer.scala:77)
at kafka.javaapi.producer.Producer.send(Producer.scala:33)
at com.test.java.gnipStreaming.GnipSparkStreamer$1$1.call(GnipSparkStreamer.java:59)
at com.test.java.gnipStreaming.GnipSparkStreamer$1$1.call(GnipSparkStreamer.java:51)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:225)
at org.apache.spark.api.java.JavaRDDLike$$anonfun$foreachPartition$1.apply(JavaRDDLike.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$33.apply(RDD.scala:920)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:1858)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:66)
at org.apache.spark.scheduler.Task.run(Task.scala:89)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:214)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
第二个例外(看起来似乎)与卡夫卡不相关。
您认为问题是什么?
修改
基于Yuval Itzchakov的评论这里是流媒体的代码
答案 0 :(得分:1)
问题是您在Producer
的迭代中实例化DStream.foreachPartition
的新实例。如果您有数据密集型流,这可能会导致大量生产者被分配并尝试连接到Kafka。
我首先要确保的是,在您使用finally
块发送数据并调用producer.close
后,您正在关闭流:
public Void call(JavaRDD<String> rdd) throws Exception {
rdd.foreachPartition(new VoidFunction<Iterator<String>>() {
@Override
public void call(Iterator<String> itr) throws Exception {
try
{
Producer<String, String> producer = getProducer(hosts);
while(itr.hasNext()) {
try {
KeyedMessage<String, String> message =
new KeyedMessage<String, String>(topic, itr.next());
producer.send(message);
} catch (Exception e) {
e.printStackTrace();
}
} finally {
producer.close()
}
}
});
return null;
}
如果仍然无效并且您看到太多连接,我会为Kafka生产者创建一个对象池,您可以根据需要进行池化。这样,您可以明确控制正在使用的可用生产者数量以及您打开的套接字数量。