Question

我对python非常陌生，但我很感激你的帮助，指导我创建一个简单的脚本，读取一堆.yaml文件（同一目录中约300个文件）并提取某个部分（仅限选区）.yaml文件并将其转换为csv。

.yaml文件中的内容示例

code: 9313
degrees:
- name: Design
  coreCourses:
  - ABCD1
  - ABCD2
  - ABCD3
  electiveGroups: #this is the section i need to extract
    - label: Electives
      options:
        - Studio1
        - Studio2
        - Studio3
    - label: OtherElectives
      options:
        - Class1
        - Development2
        - lateclass1
   specialisations:
    - label: Honours

我希望在csv中看到输出：

.yaml file name | Electives   | Studio1
.yaml file name | Electives   | Studio2
.yaml file name | Electives   | Studio3
.yaml file name | OtherElectives   | class1
.yaml file name | OtherElectives   | Development2
.yaml file name | OtherElectives   | lateclass1

我认为这将是一个相对简单的写作脚本 - 但我正在寻找一些帮助来写这篇文章。我非常陌生，所以请耐心等待。我已经写了几个vba宏，所以我希望我能相对快速地接受它。

最好的是一个完整的解决方案，并提供有关代码如何工作的一些指导。

提前感谢您的所有帮助。我希望我的问题很清楚

这是我的第一次尝试（虽然花了不长时间）：

import yaml
with open ('program_4803','r') as f:
    doc = yaml.load(f)
    txt=doc["electiveGroups"]["options"]
    file = open(“test.txt”,”w”) 
        file.write(“txt”) 
        file.close()

目前这是非常不完整的，你可能会说 - 但我正努力做到最难！

Answer 1

要解析yaml文件，请使用python yaml库

此处示例：Parsing a YAML file in Python, and accessing the data?

要写入文件，您不需要csv库

file = open(“testfile.txt”,”w”) 
file.write(“Hello World”) 
file.close()

上面的代码将写入文件，你可以迭代yaml解析的结果并相应地将输出写入文件。

Answer 2

这可能会有所帮助：

import yaml
import csv

yaml_file_names = ['data.yaml', 'data2.yaml']


rows_to_write = []

for idx, each_yaml_file in enumerate(yaml_file_names):
    print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
    with open(each_yaml_file) as f:
        data = yaml.load(f)

        for each_dict in data['degrees']:
            for each_nested_dict in each_dict['electiveGroups']:
                for each_option in each_nested_dict['options']:
                    # write to csv yaml_file_name, each_nested_dict['label'], each_option
                    rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])



with open('output_csv_file.csv', 'w') as out:
    csv_writer = csv.writer(out, delimiter='|')
    csv_writer.writerows(rows_to_write)
    print("Output file output_csv_file.csv created")

使用两个模拟输入yaml＆＃39; data.yaml和data2.yaml测试此代码，其内容如下：

data.yaml：

code: 9313
degrees:
- name: Design
  coreCourses:
  - ABCD1
  - ABCD2
  - ABCD3
  electiveGroups: #this is the section i need to extract
    - label: Electives
      options:
        - Studio1
        - Studio2
        - Studio3
    - label: OtherElectives
      options:
        - Class1
        - Development2
        - lateclass1
  specialisations:
  - label: Honours

和data2.yaml：

code: 9313
degrees:
- name: Design
  coreCourses:
  - ABCD1
  - ABCD2
  - ABCD3
  electiveGroups: #this is the section i need to extract
    - label: Electives
      options:
        - Studio1
    - label: E2
      options:
        - Class1
  specialisations:
  - label: Honours

生成的输出csv文件是：

data.yaml|Electives|Studio1
data.yaml|Electives|Studio2
data.yaml|Electives|Studio3
data.yaml|OtherElectives|Class1
data.yaml|OtherElectives|Development2
data.yaml|OtherElectives|lateclass1
data2.yaml|Electives|Studio1
data2.yaml|E2|Class1

和btw，你提出的yaml输入和你的问题，它的最后两行没有正确缩进

正如你所说，你需要解析一个目录中的300个yaml文件，你可以使用python的glob模块，如下所示：

import yaml
import csv
import glob


yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']

rows_to_write = []

for idx, each_yaml_file in enumerate(yaml_file_names):
    print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
    with open(each_yaml_file) as f:
        data = yaml.load(f)

        for each_dict in data['degrees']:
            for each_nested_dict in each_dict['electiveGroups']:
                for each_option in each_nested_dict['options']:
                    # write to csv yaml_file_name, each_nested_dict['label'], each_option
                    rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])



with open('output_csv_file.csv', 'w') as out:
    csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
    csv_writer.writerows(rows_to_write)
    print("Output file output_csv_file.csv created")

修改：正如您在评论中要求跳过那些没有yaml部分的electiveGroup文件，这是更新的程序：

import yaml
import csv
import glob


yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']

rows_to_write = []

for idx, each_yaml_file in enumerate(yaml_file_names):
    print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
    with open(each_yaml_file) as f:
        data = yaml.load(f)

        for each_dict in data['degrees']:
            try:
                for each_nested_dict in each_dict['electiveGroups']:
                    for each_option in each_nested_dict['options']:
                        # write to csv yaml_file_name, each_nested_dict['label'], each_option
                        rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
            except KeyError:
                print("No electiveGroups or options key found in", each_yaml_file)


with open('output_csv_file.csv', 'w') as out:
    csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
    csv_writer.writerows(rows_to_write)
    print("Output file output_csv_file.csv created")

需要一个从yaml文件内容中提取并作为csv文件输出的脚本

2 个答案: