我对python非常陌生,但我很感激你的帮助,指导我创建一个简单的脚本,读取一堆.yaml文件(同一目录中约300个文件)并提取某个部分(仅限选区).yaml文件并将其转换为csv。
.yaml文件中的内容示例
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- Studio2
- Studio3
- label: OtherElectives
options:
- Class1
- Development2
- lateclass1
specialisations:
- label: Honours
我希望在csv中看到输出:
.yaml file name | Electives | Studio1
.yaml file name | Electives | Studio2
.yaml file name | Electives | Studio3
.yaml file name | OtherElectives | class1
.yaml file name | OtherElectives | Development2
.yaml file name | OtherElectives | lateclass1
我认为这将是一个相对简单的写作脚本 - 但我正在寻找一些帮助来写这篇文章。我非常陌生,所以请耐心等待。我已经写了几个vba宏,所以我希望我能相对快速地接受它。
最好的是一个完整的解决方案,并提供有关代码如何工作的一些指导。
提前感谢您的所有帮助。我希望我的问题很清楚
这是我的第一次尝试(虽然花了不长时间):
import yaml
with open ('program_4803','r') as f:
doc = yaml.load(f)
txt=doc["electiveGroups"]["options"]
file = open(“test.txt”,”w”)
file.write(“txt”)
file.close()
目前这是非常不完整的,你可能会说 - 但我正努力做到最难!
答案 0 :(得分:0)
要解析yaml文件,请使用python yaml库
此处示例:Parsing a YAML file in Python, and accessing the data?
要写入文件,您不需要csv库
file = open(“testfile.txt”,”w”)
file.write(“Hello World”)
file.close()
上面的代码将写入文件,你可以迭代yaml解析的结果并相应地将输出写入文件。
答案 1 :(得分:0)
这可能会有所帮助:
import yaml
import csv
yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
使用两个模拟输入yaml' data.yaml
和data2.yaml
测试此代码,其内容如下:
data.yaml
:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- Studio2
- Studio3
- label: OtherElectives
options:
- Class1
- Development2
- lateclass1
specialisations:
- label: Honours
和data2.yaml
:
code: 9313
degrees:
- name: Design
coreCourses:
- ABCD1
- ABCD2
- ABCD3
electiveGroups: #this is the section i need to extract
- label: Electives
options:
- Studio1
- label: E2
options:
- Class1
specialisations:
- label: Honours
生成的输出csv文件是:
data.yaml|Electives|Studio1
data.yaml|Electives|Studio2
data.yaml|Electives|Studio3
data.yaml|OtherElectives|Class1
data.yaml|OtherElectives|Development2
data.yaml|OtherElectives|lateclass1
data2.yaml|Electives|Studio1
data2.yaml|E2|Class1
和btw,你提出的yaml输入和你的问题,它的最后两行没有正确缩进
正如你所说,你需要解析一个目录中的300个yaml文件,你可以使用python的glob
模块,如下所示:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")
修改:正如您在评论中要求跳过那些没有yaml
部分的electiveGroup
文件,这是更新的程序:
import yaml
import csv
import glob
yaml_file_names = glob.glob('./*.yaml')
# yaml_file_names = ['data.yaml', 'data2.yaml']
rows_to_write = []
for idx, each_yaml_file in enumerate(yaml_file_names):
print("Processing file ", idx+1, "of", len(yaml_file_names), "file name:", each_yaml_file)
with open(each_yaml_file) as f:
data = yaml.load(f)
for each_dict in data['degrees']:
try:
for each_nested_dict in each_dict['electiveGroups']:
for each_option in each_nested_dict['options']:
# write to csv yaml_file_name, each_nested_dict['label'], each_option
rows_to_write.append([each_yaml_file, each_nested_dict['label'], each_option])
except KeyError:
print("No electiveGroups or options key found in", each_yaml_file)
with open('output_csv_file.csv', 'w') as out:
csv_writer = csv.writer(out, delimiter='|', quotechar=' ')
csv_writer.writerows(rows_to_write)
print("Output file output_csv_file.csv created")