Question

我正在尝试使用Scala序列化/反序列化 Avro map complexType 。

反序列化后，我无法使用jackson将HashMap转换为JSON。

我期待以下输出：

{“ MyKey2”：“ MyValue2”，“ MyKey1”：MyValue1“}

但是我得到以下输出：

{“ MyKey2”：{“ bytes”：“ TXlWYWx1ZTI =”，“ length”：8，“ byteLength”：8}，“ MyKey1”：{“ bytes”：“ TXlWYWx1ZTE =”，“ length”：8 ，“ byteLength”：8}}

关于反序列化后如何处理HashMap的任何线索吗？代码：

import java.io.ByteArrayOutputStream

import com.fasterxml.jackson.databind.ObjectMapper
import org.apache.avro.Schema
import org.apache.avro.generic.GenericData.Record
import org.apache.avro.generic.GenericRecord
import org.apache.avro.io._
import org.apache.avro.specific.{SpecificDatumReader, SpecificDatumWriter}

object ScalaSandbox {

  def main(args: Array[String]) {

    //Avro Schema and Schema Parser
    val userSchema =
      """
        |{
        |  "type":"record",
        |  "name":"myrecord",
        |  "fields": [
        |    {"name": "test_str", "type":"string"},
        |    {"name": "test_map", "type": ["null", {"type": "map", "values": "string"}]}
        |  ]
        |}
      """.stripMargin
    val parser = new Schema.Parser()
    val schema = parser.parse(userSchema)

    //Create Record
    val f2map = new java.util.HashMap[String,String]
    f2map.put("MyKey1", "MyValue1")
    f2map.put("MyKey2", "MyValue2")
    val avroRecord: Record = new Record(schema)
    avroRecord.put("test_str", "test")
    avroRecord.put("test_map", f2map)

    //Serialize Record to Avro
    val writer = new SpecificDatumWriter[GenericRecord](schema)
    val out = new ByteArrayOutputStream()
    val encoder: BinaryEncoder = EncoderFactory.get().binaryEncoder(out, null)
    writer.write(avroRecord, encoder)
    encoder.flush()
    out.close()
    val serializedBytes: Array[Byte] = out.toByteArray()

    //Deserialize Record from Avro
    val reader: DatumReader[GenericRecord] = new SpecificDatumReader[GenericRecord](schema)
    val decoder: Decoder = DecoderFactory.get().binaryDecoder(serializedBytes, null)
    val userData: GenericRecord = reader.read(null, decoder)

    //Convert HashMap to JSON
    val test_str: String = userData.get("test_str").toString
    val test_map: java.util.HashMap[String,String] = userData.get("test_map").asInstanceOf[java.util.HashMap[String,String]]
    val example = new Example(test_str, test_map)

    println("toString of HashMap: " + example.get_map.toString) // {MyKey2=MyValue2, MyKey1=MyValue1}
    println("writeValueAsString of Hashmap: " + example.get_map_json) // {"MyKey2":"MyValue2", "MyKey1":MyValue1"}
  }

  class Example(str_field: String, map_field: java.util.HashMap[String,String]) {
    val mapper = new ObjectMapper()
    def get_str: String = str_field
    def get_map: java.util.HashMap[String,String] = map_field
    def get_map_json: String = mapper.writeValueAsString(map_field)
  }

}

Answer 1

请更改示例类mapper.writeValueAsString代码。杰克逊图书馆可能有问题。

mapper.writeValueAsString(map_field.toString.replaceAll("=", ":"))

Answer 2

使用杰克逊（Jackson）库无法正确解析反序列化的地图，因为自Avro 1.5起，Avro地图复杂数据类型使用org.apache.avro.util.Utf8。

如果我将反序列化的Map对象用作java.util.HashMap[Utf8,Utf8]的实例，则能够以非常低效的方式将Map KV转换为Json。

无论如何，我错误地尝试使用jsonEncoder

来轻松地完成Avro库本身可以完成的操作

因此，假设我们已将一些Avro有效载荷反序列化为GenericRecords，我们可以将其转换为Json，如下所示：

  def convertGenericRecordtoJson(record: GenericRecord): String = {
    val outputStream = new ByteArrayOutputStream()
    val jsonEncoder = EncoderFactory.get().jsonEncoder(record.getSchema,outputStream)
    val datumWriter = new GenericDatumWriter[GenericRecord](record.getSchema)
    datumWriter.write(record, jsonEncoder)
    jsonEncoder.flush
    outputStream.flush
    return new String(outputStream.toByteArray, Charset.forName("UTF-8"))
  }

此函数将产生有效的JSON字符串：

{“ test_str”：“ test”，“ test_map”：{“ map”：{“ MyKey2”：“ MyValue2”，“ MyKey1”：“ MyValue1”}}}}

序列化/反序列化Avro记录后，在Scala中将java.util.HashMap转换为JSON

2 个答案: