这是一个在YARN集群模式下运行的Spark Streaming应用程序,它在三个Kafka Brokers中生成消息。
一旦达到150K打开文件,它就会失败:
There is insufficient memory for the Java Runtime Environment to continue
Native memory allocation (mmap) failed to map 12288 bytes for committing reserved memory.
Job aborted due to stage failure ... :
org.apache.kafka.common.KafkaException: Failed to construct kafka producer
.....
Caused by: java.lang.OutOfMemoryError: unable to create new native thread
当为运行该执行程序的java进程执行lsof -p <PID>
时,我可以看到来自Kafka Brokers中主机服务器的TCP连接(最多90K):
host:portXXX->kafkabroker1:XmlIpcRegSvc (ESTABLISHED)
host:portYYY->kafkabroker2:XmlIpcRegSvc (ESTABLISHED)
host:portZZZ->kafkabroker3:XmlIpcRegSvc (ESTABLISHED)
我尝试将执行程序核心的数量从8减少到6但是打开文件的数量没有一个差异(仍然达到150K)然后一直没能。
从Spark Streaming连接到Kafka的库是:
org.apache.spark.streaming.kafka010.KafkaUtils
org.apache.spark.streaming.dstream.InputDStream
org.apache.kafka.clients.producer.kafkaproducer
代码:
foreachRDD{
get kafkaProducer
do some work on each RDD...
foreach( record => {
kafkaProducer.send(record._1,record._2)
}
kafkaProducer.close()
}