Kafka和Spark Streaming Simple Producer Consumer

时间:2017-11-19 08:51:06

标签: scala apache-kafka spark-streaming producer-consumer

我不知道为什么生产者发送的数据不会到达消费者手中。 我正在使用cloudera虚拟机。 我正在尝试编写简单的生产者消费者,其中生产者使用Kafka,消费者使用火花流。

scala中的生产者代码:

import java.util.Properties
import org.apache.kafka.clients.producer._

object kafkaProducer {

  def main(args: Array[String]) {
    val props = new Properties()
    props.put("bootstrap.servers", "localhost:9092")
    props.put("key.serializer", "org.apache.kafka.common.serialization.StringSerializer")
    props.put("value.serializer", "org.apache.kafka.common.serialization.StringSerializer")

    val producer = new KafkaProducer[String, String](props)

    val TOPIC = "test"

    for (i <- 1 to 50) {
      Thread.sleep(1000) //every 1 second
      val record = new ProducerRecord(TOPIC, generator.getID().toString(),generator.getRandomValue().toString())
      producer.send(record)
    }

    producer.close()
  }
}

scala中的消费者代码:

import java.util

import org.apache.kafka.clients.consumer.KafkaConsumer

import scala.collection.JavaConverters._
import java.util.Properties

import kafka.producer._

import org.apache.spark.rdd.RDD
import org.apache.spark.SparkConf
import org.apache.spark.streaming._
import org.apache.spark.streaming.StreamingContext._
import org.apache.spark.streaming.kafka._

object kafkaConsumer {
      def main(args: Array[String]) {


        var totalCount = 0L
        val sparkConf = new SparkConf().setMaster("local[1]").setAppName("AnyName").set("spark.driver.host", "localhost")
        val ssc =  new StreamingContext(sparkConf, Seconds(2))
        ssc.checkpoint("checkpoint")
        val stream = KafkaUtils.createStream(ssc, "localhost:9092", "spark-streaming-consumer-group", Map("test" -> 1))

        stream.foreachRDD((rdd: RDD[_], time: Time) => {
          val count = rdd.count()
          println("\n-------------------")
          println("Time: " + time)
          println("-------------------")
          println("Received " + count + " events\n")
          totalCount += count
        })
        ssc.start()
        Thread.sleep(20 * 1000)
        ssc.stop()

        if (totalCount > 0) {
          println("PASSED")
        } else {
          println("FAILED")
        }
      }
}

1 个答案:

答案 0 :(得分:0)

通过在消费者代码中更改以下行来解决问题:

        val stream = KafkaUtils.createStream(ssc, "localhost:9092", "spark-streaming-consumer-group", Map("test" -> 1))

第二个参数应该是zookeeper端口,2181而不是9092,并且zookeeper将设法自动连接到Kafka端口9092.

注意:在运行生产者和消费者之前,应该从终端启动Kafka。