Question

我将数据流式化为Kafka主题。我有三个发送到主题的值。数据类型为String的所有内容。我想对数据二进制文件进行编码，然后将其发送给主题。

我使用Spark版本2.3.2

但是，我在整个主题中相对较新，希望您能为我提供帮助。

我有一个键和值序列化器

val stringSerializer = "org.apache.kafka.serialization.StringSerializer"
val kafkaAvroSerializer = "org.apache.kafka.serialization.KafkaAvro"

我的课看起来像这样：

class SenderKafka [Key, Infos, DataOutput](address: Seq[InetSocketAddress], topic: String, prot: String, keyToString: ((Key, Infos, DataOutput) => (String, String)) with Serializable)
  extends (Iterator[(Key, Infos, Worked[DataOutput])] => Unit) {

  def apply(iter: Iterator[(Key, Infos, Worked[DataOutput])]): Unit = {
    val streamOutput = producer(params(address, prot))

    iter.foreach { 
    tuple =>
      val (key, Infos, Worked(dataOutput, _)) = tuple  
      val (keyString, dataOutputString) = keyToString(key, Infos, dataOutput)
      streamOutput.send(new ProducerRecord(topic, keyString, dataOutputString))
    }
    streamOutput.flush()
    streamOutput.close()
  }
}

这是我的问题：

是否可以将架构另存为.avsc文件在资源文件夹中？（src / main / resources）？为了避免先使用架构注册表？
我有一种可以对数据进行编码的方法，但是数据/模式需要具有这些数据类型：
我需要用于编码的参数如下：
我需要将“值”作为行，例如[false，nika，25岁，男性]
schema spark作为StructType
Avro架构为org.apache.avro.Schema $ RecordSchema
在Encoder中，我使用EncoderFactory（binaryEncoder），创建一个writer（GenericRecord），然后将数据写入ByteArray（java.io.ByteArrayOutputStream）
我可以替换迭代器吗？
我是否应该将数据编码到我的Kafka Out流中？
我可以以某种方式设置字符串值以使用我的编码器方法吗？
还有另一种解决方案，如何处理将字符串数据转换为avro？我知道我无法使用该方法（来自databricks的to_avro / from_avro）

如何将二进制avro数据编码和流式传输到kafka主题

0 个答案: