Kafka - scala - 处理多条消息

时间:2017-07-28 09:19:10

标签: scala apache-kafka akka akka-stream

是否可以通过Kafka Producer对象发送一个String数组。我想从'topic1'中获取一些消息 - 文本行然后将其拆分为单个单词并将其发送到另一个主题。

    object KafkaConsumer extends App {

      implicit val actorSystem = ActorSystem("test-actor-system")
      implicit val streamMaterializer = ActorMaterializer()
      implicit val executionContext = actorSystem.dispatcher
      val log = actorSystem.log


      // PRODUCER config
      val producerSettings = ProducerSettings(
        actorSystem,
        new ByteArraySerializer,
        new StringSerializer)
        .withBootstrapServers("localhost:9092")
        .withProperty("auto.create.topics.enable", "true")

      // CONSUMER config
      val consumerSettings = ConsumerSettings(
        system = actorSystem,
        keyDeserializer = new ByteArrayDeserializer,
        valueDeserializer = new StringDeserializer)
        .withBootstrapServers("localhost:9092")
        .withGroupId("kafka-sample")
        .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")
      // -----------------------------------------------------------------------//

      // ROUTE OF THE APP
      Consumer.committableSource(consumerSettings, 
      Subscriptions.topics("topic1"))
     .map { 
           msg => println(s"topic1 -> topic2: $msg") 
           ProducerMessage.Message(new ProducerRecord[Array[Byte], String]( "topic2", msg.record.value), msg.committableOffset)
          }
     .runWith(Producer.commitableSink(producerSettings))
     }  

2 个答案:

答案 0 :(得分:1)

Akka Streams示例创建一个简单的流,它读取一条消息,使用生成Kafka的Sink并为消耗的消息提交偏移量。如果您需要阅读一条或多条消息并在消费集中存在多个消息,那么您应该使用Akka Stream Graph api进行更多操作。

此示例使用Graphs并从Kafka构建一个Source并使用 groupedWithin 来读取一堆消息并获取现有单词。

创建了两个简单的流程,一个用于获取最后一个偏移量,另一个用于获取单词。然后创建一个Source阶段,将消耗的消息从Kafka广播到两个流并将结果压缩为元组(Seq [String],Long)。使用 runForeach 函数生成消息。请注意,消息不是按 Future.sequence

的顺序生成的

虽然样本看起来很长,但使用“com.typesafe.akka”%%“akka-stream-kafka”%“0.14”

进行编译和正常工作
import java.util.Properties

import akka.actor.ActorSystem
import akka.kafka.ConsumerMessage.{CommittableMessage, CommittableOffset}
import akka.kafka.{ConsumerSettings, ProducerSettings, Subscriptions}
import akka.kafka.scaladsl.Consumer
import akka.stream.{ActorMaterializer, SourceShape}
import akka.stream.scaladsl.{Broadcast, Flow, GraphDSL, Source, Zip}

import org.apache.kafka.clients.consumer.ConsumerConfig
import org.apache.kafka.clients.producer.{KafkaProducer, ProducerRecord}
import org.apache.kafka.common.serialization.{
  ByteArrayDeserializer,
  ByteArraySerializer,
  StringDeserializer,
  StringSerializer
}

import scala.concurrent.Future
import scala.util.Success
import scala.concurrent.duration._

object SplitSource extends App {

  implicit val actorSystem = ActorSystem("test-actor-system")
  implicit val streamMaterializer = ActorMaterializer()
  implicit val executionContext = actorSystem.dispatcher
  val log = actorSystem.log

  // PRODUCER config
  val producerSettings = ProducerSettings(actorSystem,
                                          new ByteArraySerializer,
                                          new StringSerializer)
    .withBootstrapServers("localhost:9092")
    .withProperty("auto.create.topics.enable", "true")

  // CONSUMER config
  val consumerSettings =
    ConsumerSettings(system = actorSystem,
                     keyDeserializer = new ByteArrayDeserializer,
                     valueDeserializer = new StringDeserializer)
      .withBootstrapServers("localhost:9092")
      .withGroupId("kafka-sample4")
      .withProperty(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "earliest")

  implicit val producerConfig = {
    val props = new Properties()
    props.setProperty("bootstrap.servers", "localhost:9092")
    props.setProperty("key.serializer", classOf[StringSerializer].getName)
    props.setProperty("value.serializer", classOf[StringSerializer].getName)
    props
  }

  lazy val kafkaProducer = new KafkaProducer[String, String](producerConfig)

  // Create Scala future from Java
  private def publishToKafka(id: String, data: String) = {
    Future {
      kafkaProducer
        .send(new ProducerRecord("outTopic", id, data))
        .get()
    }
  }

  def getKafkaSource =
    Consumer
      .committableSource(consumerSettings, Subscriptions.topics("inTopic"))
      // It consumes 10 messages or waits 30 seconds to push downstream
      .groupedWithin(10, 30 seconds)

  val getStreamSource = GraphDSL.create() { implicit b =>
    import GraphDSL.Implicits._

    val in = getKafkaSource

    // BroadCast to two flows. One for obtain the last offset to commit
    // and other to return the Seq with the words to publish
    val br = b.add(Broadcast[Seq[CommittableMessage[Array[Byte], String]]](2))
    val zipResult = b.add(Zip[CommittableOffset, Array[String]]())
    val flowCommit = Flow[Seq[CommittableMessage[Array[Byte], String]]].map(_.last.committableOffset)

    // Flow that creates the list of all words in all consumed messages
    val _flowWords =
      Flow[Seq[CommittableMessage[Array[Byte], String]]].map(input => {
        input.map(_.record.value()).mkString(" ").split(" ")
      })

    val zip = Zip[CommittableOffset, Array[String]]

    // build the Stage
    in ~> br ~> flowCommit ~> zipResult.in0
          br ~> _flowWords ~> zipResult.in1

    SourceShape(zipResult.out)
  }

  Source.fromGraph(getStreamSource).runForeach { msgs =>
    {
      // Publish all words and when all futures complete the commit the last Kafka offset
      val futures = msgs._2.map(publishToKafka("outTopic", _)).toList

      // Produces in parallel!!. Use flatMap to make it in order
      Future.sequence(futures).onComplete {
        case Success(e) => {
          // Once all futures are done, it makes commit to the last consumed message
          msgs._1.commitScaladsl()
        }
      }
    }
  }

}

Akka Stream api允许创建令人敬畏的处理流水线。

答案 1 :(得分:0)

您应该在mapConcat之前使用map,因为它

  

将每个输入元素转换为Iterable个输出元素,然后将其展平为输出流。

完整的附加行将如下:

Subscriptions.topics("topic1"))
  .mapConcat { msg => msg.record.value().split(" ").toList }
  .map { ...