是否可以通过Kafka Producer对象发送一个String数组。我想从'topic1'中获取一些消息 - 文本行然后将其拆分为单个单词并将其发送到另一个主题。
object KafkaConsumer extends App {
implicit val actorSystem = ActorSystem("test-actor-system")
implicit val streamMaterializer = ActorMaterializer()
implicit val executionContext = actorSystem.dispatcher
val log = actorSystem.log
// PRODUCER config
val producerSettings = ProducerSettings(
actorSystem,
new ByteArraySerializer,
new StringSerializer)
.withBootstrapServers("localhost:9092")
.withProperty("auto.create.topics.enable", "true")
// CONSUMER config
val consumerSettings = ConsumerSettings(
system = actorSystem,
keyDeserializer = new ByteArrayDeserializer,
valueDeserializer = new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("kafka-sample")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
// -----------------------------------------------------------------------//
// ROUTE OF THE APP
Consumer.committableSource(consumerSettings,
Subscriptions.topics("topic1"))
.map {
msg => println(s"topic1 -> topic2: $msg")
ProducerMessage.Message(new ProducerRecord[Array[Byte], String]( "topic2", msg.record.value), msg.committableOffset)
}
.runWith(Producer.commitableSink(producerSettings))
}
答案 0 :(得分:1)
Akka Streams示例创建一个简单的流,它读取一条消息,使用生成Kafka的Sink并为消耗的消息提交偏移量。如果您需要阅读一条或多条消息并在消费集中存在多个消息,那么您应该使用Akka Stream Graph api进行更多操作。
此示例使用Graphs并从Kafka构建一个Source并使用 groupedWithin 来读取一堆消息并获取现有单词。
创建了两个简单的流程,一个用于获取最后一个偏移量,另一个用于获取单词。然后创建一个Source阶段,将消耗的消息从Kafka广播到两个流并将结果压缩为元组(Seq [String],Long)。使用 runForeach 函数生成消息。请注意,消息不是按 Future.sequence 。
的顺序生成的虽然样本看起来很长,但使用“com.typesafe.akka”%%“akka-stream-kafka”%“0.14”
进行编译和正常工作import java.util.Properties
import akka.actor.ActorSystem
import akka.kafka.ConsumerMessage.{CommittableMessage, CommittableOffset}
import akka.kafka.{ConsumerSettings, ProducerSettings, Subscriptions}
import akka.kafka.scaladsl.Consumer
import akka.stream.{ActorMaterializer, SourceShape}
import akka.stream.scaladsl.{Broadcast, Flow, GraphDSL, Source, Zip}
import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import org.apache.kafka.common.serialization.{
ByteArrayDeserializer,
ByteArraySerializer,
StringDeserializer,
StringSerializer
}
import scala.concurrent.Future
import scala.util.Success
import scala.concurrent.duration._
object SplitSource extends App {
implicit val actorSystem = ActorSystem("test-actor-system")
implicit val streamMaterializer = ActorMaterializer()
implicit val executionContext = actorSystem.dispatcher
val log = actorSystem.log
// PRODUCER config
val producerSettings = ProducerSettings(actorSystem,
new ByteArraySerializer,
new StringSerializer)
.withBootstrapServers("localhost:9092")
.withProperty("auto.create.topics.enable", "true")
// CONSUMER config
val consumerSettings =
ConsumerSettings(system = actorSystem,
keyDeserializer = new ByteArrayDeserializer,
valueDeserializer = new StringDeserializer)
.withBootstrapServers("localhost:9092")
.withGroupId("kafka-sample4")
.withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
implicit val producerConfig = {
val props = new Properties()
props.setProperty("bootstrap.servers", "localhost:9092")
props.setProperty("key.serializer", classOf[StringSerializer].getName)
props.setProperty("value.serializer", classOf[StringSerializer].getName)
props
}
lazy val kafkaProducer = new KafkaProducer[String, String](producerConfig)
// Create Scala future from Java
private def publishToKafka(id: String, data: String) = {
Future {
kafkaProducer
.send(new ProducerRecord("outTopic", id, data))
.get()
}
}
def getKafkaSource =
Consumer
.committableSource(consumerSettings, Subscriptions.topics("inTopic"))
// It consumes 10 messages or waits 30 seconds to push downstream
.groupedWithin(10, 30 seconds)
val getStreamSource = GraphDSL.create() { implicit b =>
import GraphDSL.Implicits._
val in = getKafkaSource
// BroadCast to two flows. One for obtain the last offset to commit
// and other to return the Seq with the words to publish
val br = b.add(Broadcast[Seq[CommittableMessage[Array[Byte], String]]](2))
val zipResult = b.add(Zip[CommittableOffset, Array[String]]())
val flowCommit = Flow[Seq[CommittableMessage[Array[Byte], String]]].map(_.last.committableOffset)
// Flow that creates the list of all words in all consumed messages
val _flowWords =
Flow[Seq[CommittableMessage[Array[Byte], String]]].map(input => {
input.map(_.record.value()).mkString(" ").split(" ")
})
val zip = Zip[CommittableOffset, Array[String]]
// build the Stage
in ~> br ~> flowCommit ~> zipResult.in0
br ~> _flowWords ~> zipResult.in1
SourceShape(zipResult.out)
}
Source.fromGraph(getStreamSource).runForeach { msgs =>
{
// Publish all words and when all futures complete the commit the last Kafka offset
val futures = msgs._2.map(publishToKafka("outTopic", _)).toList
// Produces in parallel!!. Use flatMap to make it in order
Future.sequence(futures).onComplete {
case Success(e) => {
// Once all futures are done, it makes commit to the last consumed message
msgs._1.commitScaladsl()
}
}
}
}
}
Akka Stream api允许创建令人敬畏的处理流水线。
答案 1 :(得分:0)
您应该在mapConcat
之前使用map
,因为它
将每个输入元素转换为
Iterable
个输出元素,然后将其展平为输出流。
完整的附加行将如下:
Subscriptions.topics("topic1"))
.mapConcat { msg => msg.record.value().split(" ").toList }
.map { ...