我正在使用Spark Streaming(2.1)将一些数据发送到kafka(一些0.10版本,封装了一下)。
我像这样封装了我自己的kafka生产者:
case class MyKafkaProducer[KT, VT](props: Properties) extends Serializable{
private lazy val producer = new KafkaProducer[KT, VT](props)
def send(record: ProducerRecord[KT, VT], callback: Callback = null) = producer.send(record, callback)
def flush = producer.flush
def close = producer.close
}
我尝试过以下三种方式将数据发送到kafka,每种方式都有一些问题。最常见的问题(1&2)是从kafka集群计算机上发现到kafka的连接过多,而不是运行日志。
在第三个示例中,我没有注意到太多的连接问题(我不知道它是否遇到了,有人帮助分析?),并且在关闭一个生产者之后它无法发送。
1。广播
//too many connections
val producer = ssc.sparkContext.broadcast(MyKafkaProducer[Array[Byte], Array[Byte]](getProperties(conf))).value
dataStream.foreachRDD {
_.foreachPartition { it =>
it.foreach { row =>
val value = row.toString().getBytes("UTF-8")
def run(deep : Int = 0): Unit = {
if(deep < retryTimes){
val record = new ProducerRecord[Array[Byte], Array[Byte]](topic, key, value)
producer.send(record, new Callback {
override def onCompletion(recordMetadata: RecordMetadata, e: Exception) = if(e != null) run(deep + 1) })
}
}
run()
}
producer.flush
}
}
2.a生产者每条记录
//too many connections too
//blocked(send too slow), maybe producer has too much overhead?
dataStream.foreachRDD {
_.foreach { row =>
val producer = KafkaProducer[Array[Byte], Array[Byte]](getProperties(conf))
val value = row.toString().getBytes("UTF-8")
def run(deep : Int = 0): Unit ={
if(deep < retryTimes){
val record = new ProducerRecord[Array[Byte], Array[Byte]](topic, key, value)
producer.send(record, new Callback {
override def onCompletion(recordMetadata: RecordMetadata, e: Exception) = if(e != null) run(deep + 1)
})
}
}
run()
producer.flush
producer.close
}
}
3.a生产者每项任务
//do not know whether too many connections
//cannot send after the producer is closed
dataStream.foreachRDD {
_.foreachPartition { it =>
Thread.sleep(1000)
val producer = MyKafkaProducer[Array[Byte], Array[Byte]](getProperties(conf))
it.foreach { row =>
val value = row.toString().getBytes("UTF-8")
def run(deep : Int = 0): Unit ={
if(deep < retryTimes){
val record = new ProducerRecord[Array[Byte], Array[Byte]](topic, key, value)
producer.send(record, new Callback {
override def onCompletion(recordMetadata: RecordMetadata, e: Exception) = if(e != null) run(deep + 1)
})
}
}
run()
}
producer.flush
producer.close
}
}
我是该领域的新手,不知道如何分析。我有一些问题: