Question

我们已经用Scala用案例类（由avrohugger从avsc文件生成的avc类）表示了我们的状态来编写了Flink作业。我们希望使用Avro序列化我们的状态，以便在更新模型时可以进行状态迁移。我们了解，因为OOTB支持Flink 1.7 Avro序列化。我们在类路径中添加了flink-avro模块，但是从保存的快照还原时，我们注意到它仍在尝试使用Kryo序列化。相关代码段

case class Foo(id: String, timestamp: java.time.Instant)

val env = StreamExecutionEnvironment.getExecutionEnvironment
val conf = env.getConfig
conf.disableForceKryo()
conf.enableForceAvro()

val rawDataStream: DataStream[String] = env.addSource(MyFlinkKafkaConsumer)

val parsedDataSteam: DataStream[Foo] = rawDataStream.flatMap(new JsonParser[Foo])

// do something useful with it

env.execute("my-job")

在Foo上执行状态迁移时（例如，通过添加字段并部署作业），我发现它尝试使用Kryo进行反序列化，这显然失败了。如何确定正在使用Avro序列化？

更新

有关https://issues.apache.org/jira/browse/FLINK-10897的信息，因此仅从1.8 afaik支持使用Avro进行POJO状态序列化。我使用1.8的最新RC进行了尝试，并使用了从SpecificRecord扩展的简单WordCount POJO：

/** MACHINE-GENERATED FROM AVRO SCHEMA. DO NOT EDIT DIRECTLY */
import scala.annotation.switch

case class WordWithCount(var word: String, var count: Long) extends 
  org.apache.avro.specific.SpecificRecordBase {
  def this() = this("", 0L)
  def get(field$: Int): AnyRef = {
    (field$: @switch) match {
      case 0 => {
        word
      }.asInstanceOf[AnyRef]
      case 1 => {
        count
      }.asInstanceOf[AnyRef]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
  }
  def put(field$: Int, value: Any): Unit = {
    (field$: @switch) match {
      case 0 => this.word = {
        value.toString
      }.asInstanceOf[String]
      case 1 => this.count = {
        value
      }.asInstanceOf[Long]
      case _ => new org.apache.avro.AvroRuntimeException("Bad index")
    }
    ()
  }
  def getSchema: org.apache.avro.Schema = WordWithCount.SCHEMA$
}

object WordWithCount {
     val SCHEMA$ = new org.apache.avro.Schema.Parser().parse(" . 
       {\"type\":\"record\",\"name\":\"WordWithCount\",\"fields\": 
       [{\"name\":\"word\",\"type\":\"string\"}, 
       {\"name\":\"count\",\"type\":\"long\"}]}")
}

但是，这也没有立即可用。然后，我们尝试使用flink-avro的AvroTypeInfo定义自己的类型信息，但这失败了，因为Avro在类中查找SCHEMA $属性（SpecificData：285），并且无法使用Java反射在Scala随播对象中标识SCHEMA $

Answer 1

I could never get reflection to work是由于Scala的字段在后台是私有的。 AFAIK唯一的解决方案是更新Flink以在AvroInputFormat（compare）中使用avro的基于非反射的构造函数。

在紧要关头，除了Java之外，人们可能会退回到avro的GenericRecord，可能使用avro4从avrohugger的Standard格式生成它们（请注意，Avro4将从生成的Scala类型生成其自身的架构）

如何通过Flink 1.7对Scala案例类使用Avro序列化？

1 个答案: