Python 3.4和AVRO:无法基于模式在AVRO中转换简单消息?

时间:2019-02-22 19:24:02

标签: python python-3.x avro

我正在为具有元数据标题(即嵌套模式)的每条记录(csv)创建avro消息。

我正在使用Python 3.4。我已经下载了必需的模块,即avro-python3。 我有带有标题的csv形式的记录数据。

基本上,我具有用于创建所需消息和元数据标头的代码。

我的AVSC文件(仅示例):

Schema: {"name": "person","type": "record","fields": [{"name": "address","type": {"type" : "record","name" : "AddressUSRecord","fields" : [{"name": "streetaddress", "type": "string"},{"name": "city", "type":"string"},{"name": "pin", "type":"long"}]}}]}

我的记录也已创建。 (显示漂亮的记录格式)。

对于引脚:123.456(浮点值)

但是,当我尝试根据提到的avsc文件将上述记录转换为avro格式时,无法显示“ Tha数据不是模式的示例”。

代码:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
import csv
import json
# header class to give header data. Just simple assignment
from header import Header
# body class to give body, just simple assignment for now.
from pnlData import PnlData
import os
import sys

if __name__ == "__main__":
    schemaFile = "/path/tardisPnl.avsc"
    outFile = "/path/SampleOutLanding.avro"
    schema = avro.schema.Parse(open(schemaFile, "r").read())

    a = Header()
    a.generateMessageId() #Simple text generated for now
    a.generateTimestamp() #Simple number generated for now
    #print(a.__dict__)
    b = PnlData()
    b.generatePnlData() #Simple value assigned as seen in example
    #print(b.__dict__)

    landingMessage = {}
    landingMessage["header"] = a.__dict__
    landingMessage["pnlData"] = b.__dict__

    #print (json.dumps(landingMessage))

    writer = DataFileWriter(open(outFile, "wb"), DatumWriter(), schema)
    try:
        writer.append(landingMessage)
    except Exception as e:
        print('Error: %s ' % (e))

    writer.close()

我尝试将上述avro模式转换为JSON模式,然后根据该模式(在线链接)创建了示例JSON数据,以查看我的数据对象是否正确。实际上,我是根据基于模式生成的样本数据创建记录的。

但是,当我尝试使用它们并运行代码时,它总是会失败。

我对AVRO不太熟悉,因此需要了解我在这里缺少的内容吗?为什么这种简单的数据和架构不起作用?


我首先尝试了以下简单记录(相同的示例在线工具)和架构,并且可以使用。

简单的avsc:

{"name": "person","type": "record","fields": [{"name": "firstname", "type": "string"},{"name": "lastname", "type": "string"},{"name": "address","type": {"type" : "record","name" : "AddressUSRecord","fields" : [{"name": "streetaddress", "type": "string"},{"name": "city", "type":"string"}]}}]}

简单数据(再次打印精美):

{
  "firstname": "ABCDEFGHIJKLMN",
  "lastname": "ABCDEFGHIJKLMNOPQRSTUVWXYZAB",
  "address": {
    "streetaddress": "ABCDEFGHIJKLMN",
    "city": "ABCDEFGHIJKLMNO"
  }
}

如果我创建了上面的字典,并将与上面相同的代码(相同的代码,未进行更改)传递给avsc文件,则效果很好。

我的avsc和(简单)示例avsc的唯一区别是一个额外的嵌套属性,依此类推。我无法找到无法处理稍微复杂的数据的原因。

1 个答案:

答案 0 :(得分:0)

fastavro库具有validate函数,可以帮助您解决此问题。

使用您提供的数据和架构,如下所示:

schema = {
   "type":"record",
   "name":"SomeName",
   "doc":"This schema contains the metadata fields wrapped in a header field which follows the official SA MessageHeader schema.",
   "fields":[
      {
         "name":"header",
         "type":{
            "type":"record",
            "name":"MessageHeader",
            "fields":[
               {
                  "name":"messageId",
                  "type":"string"
               },
               {
                  "name":"businessId",
                  "type":"string"
               },
               {
                  "name":"batchId",
                  "type":"string"
               },
               {
                  "name":"sourceSystem",
                  "type":"string"
               },
               {
                  "name":"secondarySourceSystem",
                  "type":"string"
               },
               {
                  "name":"sourceSystemCreationTimestamp",
                  "type":"long"
               },
               {
                  "name":"sentBy",
                  "type":"string"
               },
               {
                  "name":"sentTo",
                  "type":"string"
               },
               {
                  "name":"messageType",
                  "type":"string"
               },
               {
                  "name":"schemaVersion",
                  "type":"string"
               },
               {
                  "name":"processing",
                  "type":"string"
               },
               {
                  "name":"sourceLocation",
                  "type":"string"
               }
            ]
         }
      },
      {
         "name":"pnlData",
         "type":{
            "type":"record",
            "name":"pnlDataDetails",
            "fields":[
               {
                  "name":"granularity",
                  "type":"string"
               },
               {
                  "name":"pnl_type",
                  "type":"string"
               },
               {
                  "name":"pnl_subtype",
                  "type":"string"
               },
               {
                  "name":"date",
                  "type":"int"
               },
               {
                  "name":"book",
                  "type":"string"
               },
               {
                  "name":"currency",
                  "type":"string"
               },
               {
                  "name":"category",
                  "type":"string"
               },
               {
                  "name":"subcategory",
                  "type":"string"
               },
               {
                  "name":"riskcategory",
                  "type":"string"
               },
               {
                  "name":"market_name",
                  "type":"string"
               },
               {
                  "name":"risk_order",
                  "type":"string"
               },
               {
                  "name":"tenor",
                  "type":"string"
               },
               {
                  "name":"product",
                  "type":"string"
               },
               {
                  "name":"trade_id",
                  "type":"string"
               },
               {
                  "name":"pnl_local",
                  "type":"long"
               },
               {
                  "name":"pnl_cde",
                  "type":"long"
               },
               {
                  "name":"pnl_status",
                  "type":"string"
               }
            ]
         }
      }
   ]
}

record = {
    "pnlData": {
        "pnl_cde": 997.8100000024,
        "pnl_status": "locked",
        "granularity": "detailed view",
        "book": "8271",
        "date": 20181130,
        "subcategory": "None",
        "pnl_local": 997.7899999917,
        "pnl_subtype": "Regular",
        "tenor": "None",
        "pnl_type": "Daily",
        "risk_order": "None",
        "market_name": "None",
        "trade_id": "None",
        "category": "None",
        "product": "None",
        "currency": "cad",
        "riskcategory": "None"
    },
    "header": {
        "sentBy": "SYSTEM",
        "businessId": "T1",
        "messageId": "pnl_0001",
        "processing": "RealTime",
        "messageType": "None",
        "sourceLocation": "None",
        "sentTo": "SA",
        "secondarySourceSystem": "None",
        "schemaVersion": "1.6T",
        "sourceSystem": "SYSTEM",
        "sourceSystemCreationTimestamp": 1236472051,
        "batchId": "None"
    }
}

import fastavro

fastavro.validation.validate(record, schema)

我得到的错误如下:"SomeName.pnlData.pnlDataDetails.pnl_local is <997.7899999917> of type <class 'float'> expected long"