我正在为具有元数据标题(即嵌套模式)的每条记录(csv)创建avro消息。
我正在使用Python 3.4。我已经下载了必需的模块,即avro-python3。 我有带有标题的csv形式的记录数据。
基本上,我具有用于创建所需消息和元数据标头的代码。
我的AVSC文件(仅示例):
Schema: {"name": "person","type": "record","fields": [{"name": "address","type": {"type" : "record","name" : "AddressUSRecord","fields" : [{"name": "streetaddress", "type": "string"},{"name": "city", "type":"string"},{"name": "pin", "type":"long"}]}}]}
我的记录也已创建。 (显示漂亮的记录格式)。
对于引脚:123.456(浮点值)
但是,当我尝试根据提到的avsc文件将上述记录转换为avro格式时,无法显示“ Tha数据不是模式的示例”。
代码:
import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
import csv
import json
# header class to give header data. Just simple assignment
from header import Header
# body class to give body, just simple assignment for now.
from pnlData import PnlData
import os
import sys
if __name__ == "__main__":
schemaFile = "/path/tardisPnl.avsc"
outFile = "/path/SampleOutLanding.avro"
schema = avro.schema.Parse(open(schemaFile, "r").read())
a = Header()
a.generateMessageId() #Simple text generated for now
a.generateTimestamp() #Simple number generated for now
#print(a.__dict__)
b = PnlData()
b.generatePnlData() #Simple value assigned as seen in example
#print(b.__dict__)
landingMessage = {}
landingMessage["header"] = a.__dict__
landingMessage["pnlData"] = b.__dict__
#print (json.dumps(landingMessage))
writer = DataFileWriter(open(outFile, "wb"), DatumWriter(), schema)
try:
writer.append(landingMessage)
except Exception as e:
print('Error: %s ' % (e))
writer.close()
我尝试将上述avro模式转换为JSON模式,然后根据该模式(在线链接)创建了示例JSON数据,以查看我的数据对象是否正确。实际上,我是根据基于模式生成的样本数据创建记录的。
但是,当我尝试使用它们并运行代码时,它总是会失败。
我对AVRO不太熟悉,因此需要了解我在这里缺少的内容吗?为什么这种简单的数据和架构不起作用?
我首先尝试了以下简单记录(相同的示例在线工具)和架构,并且可以使用。
简单的avsc:
{"name": "person","type": "record","fields": [{"name": "firstname", "type": "string"},{"name": "lastname", "type": "string"},{"name": "address","type": {"type" : "record","name" : "AddressUSRecord","fields" : [{"name": "streetaddress", "type": "string"},{"name": "city", "type":"string"}]}}]}
简单数据(再次打印精美):
{
"firstname": "ABCDEFGHIJKLMN",
"lastname": "ABCDEFGHIJKLMNOPQRSTUVWXYZAB",
"address": {
"streetaddress": "ABCDEFGHIJKLMN",
"city": "ABCDEFGHIJKLMNO"
}
}
如果我创建了上面的字典,并将与上面相同的代码(相同的代码,未进行更改)传递给avsc文件,则效果很好。
我的avsc和(简单)示例avsc的唯一区别是一个额外的嵌套属性,依此类推。我无法找到无法处理稍微复杂的数据的原因。
答案 0 :(得分:0)
fastavro
库具有validate
函数,可以帮助您解决此问题。
使用您提供的数据和架构,如下所示:
schema = {
"type":"record",
"name":"SomeName",
"doc":"This schema contains the metadata fields wrapped in a header field which follows the official SA MessageHeader schema.",
"fields":[
{
"name":"header",
"type":{
"type":"record",
"name":"MessageHeader",
"fields":[
{
"name":"messageId",
"type":"string"
},
{
"name":"businessId",
"type":"string"
},
{
"name":"batchId",
"type":"string"
},
{
"name":"sourceSystem",
"type":"string"
},
{
"name":"secondarySourceSystem",
"type":"string"
},
{
"name":"sourceSystemCreationTimestamp",
"type":"long"
},
{
"name":"sentBy",
"type":"string"
},
{
"name":"sentTo",
"type":"string"
},
{
"name":"messageType",
"type":"string"
},
{
"name":"schemaVersion",
"type":"string"
},
{
"name":"processing",
"type":"string"
},
{
"name":"sourceLocation",
"type":"string"
}
]
}
},
{
"name":"pnlData",
"type":{
"type":"record",
"name":"pnlDataDetails",
"fields":[
{
"name":"granularity",
"type":"string"
},
{
"name":"pnl_type",
"type":"string"
},
{
"name":"pnl_subtype",
"type":"string"
},
{
"name":"date",
"type":"int"
},
{
"name":"book",
"type":"string"
},
{
"name":"currency",
"type":"string"
},
{
"name":"category",
"type":"string"
},
{
"name":"subcategory",
"type":"string"
},
{
"name":"riskcategory",
"type":"string"
},
{
"name":"market_name",
"type":"string"
},
{
"name":"risk_order",
"type":"string"
},
{
"name":"tenor",
"type":"string"
},
{
"name":"product",
"type":"string"
},
{
"name":"trade_id",
"type":"string"
},
{
"name":"pnl_local",
"type":"long"
},
{
"name":"pnl_cde",
"type":"long"
},
{
"name":"pnl_status",
"type":"string"
}
]
}
}
]
}
record = {
"pnlData": {
"pnl_cde": 997.8100000024,
"pnl_status": "locked",
"granularity": "detailed view",
"book": "8271",
"date": 20181130,
"subcategory": "None",
"pnl_local": 997.7899999917,
"pnl_subtype": "Regular",
"tenor": "None",
"pnl_type": "Daily",
"risk_order": "None",
"market_name": "None",
"trade_id": "None",
"category": "None",
"product": "None",
"currency": "cad",
"riskcategory": "None"
},
"header": {
"sentBy": "SYSTEM",
"businessId": "T1",
"messageId": "pnl_0001",
"processing": "RealTime",
"messageType": "None",
"sourceLocation": "None",
"sentTo": "SA",
"secondarySourceSystem": "None",
"schemaVersion": "1.6T",
"sourceSystem": "SYSTEM",
"sourceSystemCreationTimestamp": 1236472051,
"batchId": "None"
}
}
import fastavro
fastavro.validation.validate(record, schema)
我得到的错误如下:"SomeName.pnlData.pnlDataDetails.pnl_local is <997.7899999917> of type <class 'float'> expected long"