使用Python脚本进行CSV到Yaml的转换

时间:2017-10-11 05:57:39

标签: python csv yaml

我需要将csv spec文件转换为YAML文件以满足项目需求。我为此编写了一小段python代码,但它没有按预期工作。我不能使用任何在线转换器,因为我工作的客户不会接受。这是我的python代码:

import csv
csvfile = open('custInfo.csv', 'r')

datareader = csv.reader(csvfile, delimiter=',', quotechar='"')
data_headings = []

yaml_pretext = "sourceTopic : 'BIG_PARTY'"
yaml_pretext += "\n"+'validationRequired : true'+"\n"
yaml_pretext += "\n"+'columnMappingEntityList :'+"\n"
for row_index, row in enumerate(datareader):
    if row_index == 0:
        data_headings = row
    else:
        # new_yaml = open('outfile.yaml', 'w')
        yaml_text = ""
        for cell_index, cell in enumerate(row):
            lineSeperator = "    "
            cell_heading = data_headings[cell_index].lower().replace(" ", "_").replace("-", "")
            if (cell_heading == "source"):
                lineSeperator = '  - '

            cell_text = lineSeperator+cell_heading + " : " + cell.replace("\n", ", ") + "\n"

            yaml_text += cell_text
        print yaml_text

csvfile.close()

csv文件有4列,现在是:

source               destination        type     childFields
fra:AppData          app_data           array    application_id,institute_nm
fra:ApplicationId    application_id     string   null
fra:InstituteName    institute_nm       string   null
fra:CustomerData     customer_data      array    name,customer_address,telephone_number
fra:Name             name               string   null
fra:CustomerAddress  customer_address   array    street,pincode
fra:Street           street             string   null
fra:Pincode          pincode            string   null
fra:TelephoneNumber  telephone_number   string   null

这是我作为输出获取的yaml文件

  - source : fra:AppData
    destination : app_data
    type : array
    childfields : application_id,institute_nm

  - source : fra:ApplicationId
    destination : application_id
    type : string
    childfields : null

  - source : fra:InstituteName
    destination : institute_nm
    type : string
    childfields : null

  - source : fra:CustomerData
    destination : customer_data
    type : array
    childfields : name,customer_address,telephone_number

  - source : fra:Name
    destination : name
    type : string
    childfields : null

  - source : fra:CustomerAddress
    destination : customer_address
    type : array
    childfields : street,pincode

  - source : fra:Street
    destination : street
    type : string
    childfields : null

  - source : fra:Pincode
    destination : pincode
    type : string
    childfields : null

  - source : fra:TelephoneNumber
    destination : telephone_number
    type : string
    childfields : null

当类型是数组时,我需要输出为childField,而不是新行。所以期望的输出将是:

  - source : fra:AppData
    destination : app_data
    type : array
    childfields : application_id,institute_nm
      - source : fra:ApplicationId
        destination : application_id
        type : string
        childfields : null

      - source : fra:InstituteName
        destination : institute_nm
        type : string
        childfields : null

  - source : fra:CustomerData
    destination : customer_data
    type : array
    childfields : name,customer_address,telephone_number
      - source : fra:Name
        destination : name
        type : string
        childfields : null

      - source : fra:CustomerAddress
        destination : customer_address
        type : array
        childfields : street,pincode
           - source : fra:Street
           destination : street
           type : string
           childfields : null

           - source : fra:Pincode
           destination : pincode
           type : string
           childfields : null

      - source : fra:TelephoneNumber
        destination : telephone_number
        type : string
        childfields : null

我怎么能得到这个?

1 个答案:

答案 0 :(得分:2)

您目前没有使用任何YAML库来生成输出。这是不好的做法,因为您不检查输出的字符串内容是否包含要求引用的YAML特殊字符。

接下来,这是无效的YAML:

    childfields : application_id,institute_nm
      - source : fra:ApplicationId
        destination : application_id
        type : string
        childfields : null

childfields不能同时具有标量值(application_id,institute_nm)和序列值(从项- source : fra:ApplicationId开始)。

尝试使用列表和dicts生成结构,然后转储该结构:

import yaml,csv

csvfile = open('custInfo.csv', 'r')
datareader = csv.reader(csvfile, delimiter=",", quotechar='"')
result = list()
type_index = -1
child_fields_index = -1

for row_index, row in enumerate(datareader):
  if row_index == 0:
    # let's do this once here
    data_headings = list()
    for heading_index, heading in enumerate(row):
      fixed_heading = heading.lower().replace(" ", "_").replace("-", "")
      data_headings.append(fixed_heading)
      if fixed_heading == "type":
        type_index = heading_index
      elif fixed_heading == "childfields":
        child_fields_index = heading_index
  else:
    content = dict()
    is_array = False
    for cell_index, cell in enumerate(row):
      if cell_index == child_fields_index and is_array:
        content[data_headings[cell_index]] = [{
            "source" : "fra:" + value.capitalize(),
            "destination" : value,
            "type" : "string",
            "childfields" : "null"
          } for value in cell.split(",")]
      else:
        content[data_headings[cell_index]] = cell
        is_array = (cell_index == type_index) and (cell == "array")
    result.append(content)
print yaml.dump(result)