卡夫卡 - 从火花中消耗

时间:2017-11-13 09:34:23

标签: apache-spark apache-kafka avro confluent-schema-registry

我关注this document,效果很好。现在我尝试从spark消耗连接器数据。有什么参考我可以使用吗?由于我使用汇合,它与原始的kafka参考文档有很大的不同。

到目前为止,我已经使用了一些代码。问题是无法将记录数据转换为java.String。 (并且不确定它是否正确消费)

val brokers = "http://127.0.0.1:9092"
val topics = List("postgres-accounts2")
val sparkConf = new SparkConf().setAppName("KafkaWordCount")
//sparkConf.setMaster("spark://sda1:7077,sda2:7077")
sparkConf.setMaster("local[2]")
sparkConf.set("spark.serializer", "org.apache.spark.serializer.KryoSerializer") 
sparkConf.registerKryoClasses(Array(classOf[org.apache.avro.generic.GenericData$Record]))

val ssc = new StreamingContext(sparkConf, Seconds(2))
ssc.checkpoint("checkpoint")


 // Create direct kafka stream with brokers and topics
//val topicsSet = topics.split(",")

val kafkaParams = Map[String, Object](
  "schema.registry.url" -> "http://127.0.0.1:8081",
  "bootstrap.servers" -> "http://127.0.0.1:9092",
  "key.deserializer" -> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
   "value.deserializer" -> "io.confluent.kafka.serializers.KafkaAvroDeserializer",
  "group.id" -> "use_a_separate_group_id_for_each_stream",
  "auto.offset.reset" -> "earliest",
  "enable.auto.commit" -> (false: java.lang.Boolean)
)

val messages = KafkaUtils.createDirectStream[String, String](
  ssc,
  PreferConsistent,
  Subscribe[String, String](topics, kafkaParams)
)

val data = messages.map(record => {
    println( record) 
    println( "value : " + record.value().toString() ) // error  java.lang.ClassCastException: org.apache.avro.generic.GenericData$Record cannot be cast to java.lang.String
    //println( Json.parse( record.value() + ""))

    (record.key, record.value)
})

1 个答案:

答案 0 :(得分:0)

将我的值解串器同步到下面。它将提供适当的功能和类型。

KafkaUtils.createDirectStream[String, record]