我正在尝试从AVRO文件创建一个包含表格结构的SQL表:
{
"type" : "record",
"name" : "warranty",
"doc" : "Schema generated by Kite",
"fields" : [ {
"name" : "id",
"type" : "long",
"doc" : "Type inferred from '1'"
}, {
"name" : "train_id",
"type" : "long",
"doc" : "Type inferred from '21691'"
}, {
"name" : "siemens_nr",
"type" : "string",
"doc" : "Type inferred from 'Loco-001'"
}, {
"name" : "uic_nr",
"type" : "long",
"doc" : "Type inferred from '193901'"
}, {
"name" : "Configuration",
"type" : "string",
"doc" : "Type inferred from 'ZP28'"
}, {
"name" : "Warranty_Status",
"type" : "string",
"doc" : "Type inferred from 'Out_of_Warranty'"
}, {
"name" : "Warranty_Data_Type",
"type" : "string",
"doc" : "Type inferred from 'Real_based_on_preliminary_acceptance_date'"
}, {
"name" : "of_progression",
"type" : "long",
"doc" : "Type inferred from '100'"
}, {
"name" : "Delivery_Date",
"type" : "string",
"doc" : "Type inferred from '18/12/2009'"
}, {
"name" : "Warranty_on_Delivery_Date",
"type" : "string",
"doc" : "Type inferred from '18/12/2013'"
}, {
"name" : "Customer_Status",
"type" : "string",
"doc" : "Type inferred from 'homologation'"
}, {
"name" : "Commissioning_Date",
"type" : "string",
"doc" : "Type inferred from '6/10/2010'"
}, {
"name" : "Preliminary_acceptance_date",
"type" : "string",
"doc" : "Type inferred from '6/01/2011'"
}, {
"name" : "Warranty_Start_Date",
"type" : "string",
"doc" : "Type inferred from '6/01/2011'"
}, {
"name" : "Warranty_End_Date",
"type" : "string",
"doc" : "Type inferred from '6/01/2013'"
}, {
"name" : "Effective_End_Warranty_Date",
"type" : [ "null", "string" ],
"doc" : "Type inferred from 'null'",
"default" : null
}, {
"name" : "Level_2_in_function",
"type" : "string",
"doc" : "Type inferred from '17/07/2015'"
}, {
"name" : "Baseline",
"type" : "string",
"doc" : "Type inferred from '2.10.23.4'"
}, {
"name" : "TC_report",
"type" : "string",
"doc" : "Type inferred from 'A480140'"
}, {
"name" : "Last_version_Date",
"type" : "string",
"doc" : "Type inferred from 'A-23/09/2015'"
} ]
}
做这项工作,我正在使用(如果你有其他命题更简单就会很棒)
所以使用python我会得到这样的结果:
{'name':'id',type':'long','doc':'blablabla'}
我的问题是如何从这个结果中在python中创建一个SQL表?
感谢您的帮助
答案 0 :(得分:0)
使用json模块,你可以从你的字符串中获取一个字典,然后你有一个字段定义数组。您遍历该数组以生成SQL语句。
注意:您需要一些机制来将avro字段类型映射到SQL字段类型,尤其是在类型为"type" : [ "null", "string" ]
的情况下。
以下是基于字符串构建SQL CREATE TABLE语句的代码的工作示例:
import json
schema_str = """{
"type" : "record",
"name" : "warranty",
"doc" : "Schema generated by Kite",
"fields" : [ {
"name" : "id",
"type" : "long",
"doc" : "Type inferred from '1'"
}, {
"name" : "train_id",
"type" : "long",
"doc" : "Type inferred from '21691'"
}, {
"name" : "siemens_nr",
"type" : "string",
"doc" : "Type inferred from 'Loco-001'"
} ]
}"""
schema = json.loads(schema_str)
fields = schema['fields']
sql_string = 'CREATE TABLE ' + schema['name'] + ' ( \n'
for field in fields :
sql_string = sql_string + field['name'] + ' ' + field['type'] + ', \n'
sql_string = sql_string[:-3] + '\n)' # get rid of last comma and close the field list
print sql_string