Question

这些是使用Confluent平台序列化的Avros。

我想找到一个像这样的工作示例：

https://github.com/seanpquig/confluent-platform-spark-streaming/blob/master/src/main/scala/example/StreamingJob.scala

但是对于Spark Structured Streaming。

 kafka
   .select("value")
   .map { row => 

     // this gives me test == testRehydrated    
     val test = Foo("bar") 
     val testBytes = AvroWriter[Foo].toBytes(test)
     val testRehydrated = AvroReader[Foo].fromBytes(testBytes)


     // this yields mangled Foo data
     val bytes = row.getAs[Array[Byte]]("value") 
     val rehydrated = AvroReader[Foo].fromBytes(bytes)

Answer 1

我们一直致力于这个图书馆，可能有所帮助：ABRiS (Avro Bridge for Spark)

它提供了用于在读取和写入操作（流式传输和批处理）中集成Spark到Avro的API。它还支持Confluent Kafka并与Schema Registry集成。

免责声明：我为ABSA工作，我是这个库背后的主要开发人员。

Answer 2

如果你想阅读他们的东西，我想你必须使用Confluent平台解码器。

def decoder: io.confluent.kafka.serializers.KafkaAvroDecoder = {
  val props = new Properties()
  props.put(AbstractKafkaAvroSerDeConfig.SCHEMA_REGISTRY_URL_CONFIG, getSchemaRegistryUrl())
  props.put(KafkaAvroDeserializerConfig.SPECIFIC_AVRO_READER_CONFIG, "true")
  val vProps = new kafka.utils.VerifiableProperties(props)
  new io.confluent.kafka.serializers.KafkaAvroDecoder(vProps)
}

如何使用Spark Streaming从Kafka读取二进制序列化Avro（Confluent Platform）

2 个答案: