Question

我正在尝试将avro字节流反序列化为scala case类对象。基本上，我有一个带有avro编码数据流的kafka流，现在该模式有一个附加功能，我正在尝试更新scala case类以包括新字段。案例类看起来像这样

/** Case class to hold the Device data. */
case class DeviceData(deviceId: String,
                sw_version: String,
                timestamp: String,
                reading: Double,
                new_field: Option[String] = None
               )  {

this（）= this（“ na”，“ na”，“ na”，0，无） }

avro架构如下：

{
  "type": "record",
  "name": "some_name",
  "namespace": "some_namespace",
  "fields": [
    {
      "name": "deviceId",
      "type": "string"
    },
    {
      "name": "sw_version",
      "type": "string"
    }, 
    {
      "name": "timestamp",
      "type": "string"
    },
    {
      "name": "reading",
      "type": "double"
    },
    {
      "name": "new_field",
     "type": ["null", "string"],
      "default": null
    }]}

收到数据后，出现以下异常：

java.lang.RuntimeException: java.lang.InstantiationException

我可以用python编写的使用者接收数据，因此我知道数据已以正确的格式正确传输。我怀疑问题出在案例类构造函数的创建上，我尝试这样做：

/** Case class to hold the Device data. */
case class DeviceData(deviceId: String,
                sw_version: String,
                timestamp: String,
                reading: Double,
                new_field: Option[String]
               )  {
this() = this("na", "na", "na", 0, some("na"))
}

但没有运气。

解串器代码为（摘录）：

// reader and decoder for reading avro records
private var reader: DatumReader[T] = null
private var decoder : BinaryDecoder = null
decoder = DecoderFactory.get.binaryDecoder(message, decoder)
reader.read(null.asInstanceOf[T], decoder)

我找不到其他用于为反序列化avro的case类构造函数的示例，我在去年java.lang.NoSuchMethodException for init method in Scala case class发表了一个相关问题，并根据响应能够实现我当前的代码，从那以后一直运行良好。

Answer 1

我通过采用完全不同的方法解决了这个问题。我使用了本示例example here中提供的Confluent Kafka客户端。我还有一个Confluent模式注册表，使用kafka和模式注册表https://github.com/jfrazee/schema-registry-examples/tree/master/src/main/scala/io/atomicfinch/examples/flink附带的容器化的所有解决方案非常容易设置。

我必须在我的pom.xml文件中添加合流的依赖项和存储库。这在存储库部分中。

<repository>
    <id>confluent</id>
    <url>http://packages.confluent.io/maven/</url>
</repository>

这出现在依赖项部分：

<dependency>
    <groupId>org.apache.flink</groupId>
    <artifactId>flink-avro-confluent-registry</artifactId>
    <version>1.8.0</version>
</dependency>
<dependency>
    <groupId>io.confluent</groupId>
    <artifactId>kafka-avro-serializer</artifactId>
    <!-- For Confluent Platform 5.2.1 -->
    <version>5.2.1</version>
</dependency>

使用https://docs.confluent.io/current/quickstart/ce-docker-quickstart.html中提供的代码，我可以与Confluent模式注册表进行对话，然后基于avro消息头中的模式ID，这将从模式reg下载模式，并从中返回我的GenericRecord对象我可以轻松地对任何感兴趣的所有字段进行创建，并创建DeviceData对象的新DataStream。

val kafka_consumer = new FlinkKafkaConsumer010("prod.perfwarden.minute",
  new ConfluentRegistryDeserializationSchema[GenericRecord](classOf[GenericRecord], "http://localhost:8081"),
  properties)
val device_data_stream = env
  .addSource(kafka_consumer)
  .map({x => new DeviceData(x.get("deviceId").toString,
    x.get("sw_version").toString,
    x.get("timestamp").toString,
    x.get("reading").toString.toDouble,
    x.get("new_field").toString)})

融合的kafka客户端负责按照架构（包括默认值）反序列化avro字节流。设置模式注册表并使用融合的kafka客户端可能需要一点时间来习惯，但可能是更好的长期解决方案，只需2美分。

java.lang.Instantiation在将字节流反序列化为Scala case类对象时发生异常

1 个答案: