有人可以帮我吗?使用以下代码将消息发布到kafka时,性能出现问题
message.foreachPartition{ part =>
val producer = new KafkaProducer[String, String](props)
part.foreach{ msg =>
val message = new ProducerRecord[String, String](topic, msg._1, msg._2)
producer.send(message)
}
producer.close()
}
因此,我使用post优化了性能。下面是我在代码中编写的代码。
val kafkaSink = sparkContext.broadcast(KafkaSink(kafkaProps))
resultRDD.foreach{message =>
kafkaSink.value.send(outputTopic, message._1, message._2)
}
class KafkaSink(createProducer: () => KafkaProducer[String, String]) extends Serializable {
lazy val producer = createProducer()
def send(topic: String, key:String, value: String): Unit =
producer.send(new ProducerRecord(topic, key, value))
}
object KafkaSink {
def apply(config: Map[String, Object]): KafkaSink = {
val f = () => {
val producer = new KafkaProducer[String, String](config.asJava)
sys.addShutdownHook {
producer.close()
}
producer
}
new KafkaSink(f)
}}
但是程序被卡住了,甚至没有任何消息发布到kafka。我已经检查了日志,只能在纱线日志文件中找到以下信息。
producer.KafkaProducer:超时关闭Kafka生产者 = 9223372036854775807 ms
请让我知道我在想什么。 Spark版本是1.6.0。目前,发布消息所需的时间大约为8秒,每20秒钟的批处理间隔约为30万条消息。
谢谢。
答案 0 :(得分:0)
由于没有直接方法可以从Spark Streaming(版本<2.2)将消息写入Kafka,因此我将尝试使用ForeachWriter
创建KafkaSinkWritter
import java.util.Properties
import org.apache.kafka.clients.producer._
import org.apache.spark.sql.ForeachWriter
class KafkaSink(topic:String, servers:String) extends ForeachWriter[(String, String)] {
val kafkaProperties = new Properties()
kafkaProperties.put("bootstrap.servers", servers)
kafkaProperties.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
kafkaProperties.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")
val results = new scala.collection.mutable.HashMap[String, String]
var producer: KafkaProducer[String, String] = _
def open(partitionId: Long,version: Long): Boolean = {
producer = new KafkaProducer(kafkaProperties)
true
}
def process(value: (String, String)): Unit = {
producer.send(new ProducerRecord(topic, value._1 + ":" + value._2))
}
def close(errorOrNull: Throwable): Unit = {
producer.close()
}
}
使用SinkWriter编写消息
val topic = "<topic2>"
val brokers = "<server:ip>"
val writer = new KafkaSink(topic, brokers)
val query =
streamingSelectDF
.writeStream
.foreach(writer)
.outputMode("update")
.start()