Question

我是 Avro 架构的新手。我根据参考 JSON 创建了以下架构，但我无法为此创建序列化程序。

{
  "name": "Name",
  "type": "record",
  "namespace": "NameSpace",
  "fields": [
    {
      "name": "discussions",
      "comment": "discussion ID.",
      "type": {
        "type": "array",
        "items": {
          "name": "discussionsRecord",
          "comment": "discussion Identifier.",
          "type": "record",
          "fields": [
            {
              "name": "discussionId",
              "type": "long"
            },
            {
              "name": "channelType",
              "comment": "channel Type Identification.",
              "type": "int"
            },
            {
              "name": "data",
              "comment": "The following block is to capture channel values.",
              "type": {
                "type": "array",
                "items": 
                [
                   {
                      "name": "dataRecord",
                      "type": "record",
                      "fields": [
                        {
                          "name": "pulse",
                          "comment": "Pulse.",
                          "type": "long"
                        },
                        {
                          "name": "communicationName",
                          "comment": "communication Identification.",
                          "type": {
                          "name": "communicationNameEnumType",
                          "comment": "enum for communication Names.",
                          "type": "enum",
                          "symbols": [
                          "cold", "rainIntensity", "heat"
                                     ]
                                  }
                        },
                        {
                          "name": "communicationValue",
                          "comment": "communication Values.",
                          "type": "double"
                        },
                        {
                          "name": "classValue",
                          "comment": "communication class.",
                          "type": {
                          "name": "classValueEnumType",
                          "comment": "enum for Class types.",
                          "type": "enum",
                          "symbols": [
                          "Dark", "Logical"
                                     ]
                                  }
                        }
                      ]
                    }
                ]
              }
            }
          ]
        }
      }
    }
  ]
}

Answer 1

如果你有一个 AVSC 模式，你可以像这样创建一个 SparkSQL 模式（scala）

import org.apache.avro.Schema
import org.apache.spark.sql._
import org.apache.spark.sql.avro.SchemaConverters

val avroSchema : String = ...
val sparkSchema = SchemaConverters.toSqlType(new Schema.Parser().parse(avroSchema))

否则，to_avro() 将现有数据帧及其架构序列化为 Avro 输出

Avro 架构的序列化程序

1 个答案: