我需要将csv spec文件转换为YAML文件以满足项目需求。我为此编写了一小段python代码,但它没有按预期工作。我不能使用任何在线转换器,因为我工作的客户不会接受。这是我的python代码:
import csv
csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []
yaml_pretext = "sourceTopic : 'BIG_PARTY'"
yaml_pretext += "\n"+'validationRequired : true'+"\n"
yaml_pretext += "\n"+'columnMappingEntityList :'+"\n"
for row_index, row in enumerate(datareader):
if row_index == 0:
data_headings = row
else:
# new_yaml = open('outfile.yaml', 'w')
yaml_text = ""
for cell_index, cell in enumerate(row):
lineSeperator = " "
cell_heading = data_headings[cell_index].lower().replace(" ", "_").replace("-", "")
if (cell_heading == "source"):
lineSeperator = ' - '
cell_text = lineSeperator+cell_heading + " : " + cell.replace("\n", ", ") + "\n"
yaml_text += cell_text
print yaml_text
csvfile.close()
csv文件有4列,现在是:
source destination type childFields
fra:AppData app_data array application_id,institute_nm
fra:ApplicationId application_id string null
fra:InstituteName institute_nm string null
fra:CustomerData customer_data array name,customer_address,telephone_number
fra:Name name string null
fra:CustomerAddress customer_address array street,pincode
fra:Street street string null
fra:Pincode pincode string null
fra:TelephoneNumber telephone_number string null
这是我作为输出获取的yaml文件
- source : fra:AppData
destination : app_data
type : array
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
- source : fra:InstituteName
destination : institute_nm
type : string
childfields : null
- source : fra:CustomerData
destination : customer_data
type : array
childfields : name,customer_address,telephone_number
- source : fra:Name
destination : name
type : string
childfields : null
- source : fra:CustomerAddress
destination : customer_address
type : array
childfields : street,pincode
- source : fra:Street
destination : street
type : string
childfields : null
- source : fra:Pincode
destination : pincode
type : string
childfields : null
- source : fra:TelephoneNumber
destination : telephone_number
type : string
childfields : null
当类型是数组时,我需要输出为childField,而不是新行。所以期望的输出将是:
- source : fra:AppData
destination : app_data
type : array
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
- source : fra:InstituteName
destination : institute_nm
type : string
childfields : null
- source : fra:CustomerData
destination : customer_data
type : array
childfields : name,customer_address,telephone_number
- source : fra:Name
destination : name
type : string
childfields : null
- source : fra:CustomerAddress
destination : customer_address
type : array
childfields : street,pincode
- source : fra:Street
destination : street
type : string
childfields : null
- source : fra:Pincode
destination : pincode
type : string
childfields : null
- source : fra:TelephoneNumber
destination : telephone_number
type : string
childfields : null
我怎么能得到这个?
答案 0 :(得分:2)
您目前没有使用任何YAML库来生成输出。这是不好的做法,因为您不检查输出的字符串内容是否包含要求引用的YAML特殊字符。
接下来,这是无效的YAML:
childfields : application_id,institute_nm
- source : fra:ApplicationId
destination : application_id
type : string
childfields : null
childfields
不能同时具有标量值(application_id,institute_nm
)和序列值(从项- source : fra:ApplicationId
开始)。
尝试使用列表和dicts生成结构,然后转储该结构:
import yaml,csv
csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=",", quotechar='"')
result = list()
type_index = -1
child_fields_index = -1
for row_index, row in enumerate(datareader):
if row_index == 0:
# let's do this once here
data_headings = list()
for heading_index, heading in enumerate(row):
fixed_heading = heading.lower().replace(" ", "_").replace("-", "")
data_headings.append(fixed_heading)
if fixed_heading == "type":
type_index = heading_index
elif fixed_heading == "childfields":
child_fields_index = heading_index
else:
content = dict()
is_array = False
for cell_index, cell in enumerate(row):
if cell_index == child_fields_index and is_array:
content[data_headings[cell_index]] = [{
"source" : "fra:" + value.capitalize(),
"destination" : value,
"type" : "string",
"childfields" : "null"
} for value in cell.split(",")]
else:
content[data_headings[cell_index]] = cell
is_array = (cell_index == type_index) and (cell == "array")
result.append(content)
print yaml.dump(result)