我想合并两个包含列表元素的YAML文件。 (A)和(B)合并为新文件(C)。
如果(B)中也定义了列表项的现有属性值,我想覆盖它们。
如果未在(A)中定义但在(B)中定义,我想向列表条目添加新属性。
如果(A)中不存在,我也想添加(B)的新列表条目。
YAML文件A:
list:
- id: 1
name: "name-from-A"
- id: 2
name: "name-from-A"
YAML文件B:
list:
- id: 1
name: "name-from-B"
- id: 2
title: "title-from-B"
- id: 3
name: "name-from-B"
title: "title-from-B"
合并后的YAML文件(C),我想生成:
list:
- id: 1
name: "name-from-B"
- id: 2
name: "name-from-A"
title: "title-from-B"
- id: 3
name: "name-from-B"
title: "title-from-B"
我需要在Bash脚本中使用此功能,但是我可以在环境中使用Python。
是否有任何独立的YAML处理器(例如yq)可以做到这一点?
我该如何在Python脚本中实现类似的功能?
答案 0 :(得分:1)
您可以使用ruamel.yaml
python软件包来做到这一点。
如果已经安装了python,请在终端中运行以下命令:
pip install ruamel.yaml
来自here的python代码 适应 。 (经过测试,可以正常工作):
import ruamel.yaml
yaml = ruamel.yaml.YAML()
#Load the yaml files
with open('/test1.yaml') as fp:
data = yaml.load(fp)
with open('/test2.yaml') as fp:
data1 = yaml.load(fp)
# dict to contain merged ids
merged = dict()
#Add the 'list' from test1.yaml to test2.yaml 'list'
for i in data1['list']:
for j in data['list']:
# if same 'id'
if i['id'] == j['id']:
i.update(j)
merged[i['id']] = True
# add new ids if there is some
for j in data['list']:
if not merged.get(j['id'], False):
data1['list'].append(j)
#create a new file with merged yaml
with open('/merged.yaml', 'w') as yaml_file:
yaml.dump(data1, yaml_file)
答案 1 :(得分:1)
您可以合并在命令行中传递的yaml文件:
import sys
import yaml
def merge_dict(m_list, s):
for m in m_list:
if m['id'] == s['id']:
m.update(**s)
return
m_list.append(s)
merged_list = []
for f in sys.argv[1:]:
with open(f) as s:
for source in yaml.safe_load(s)['list']:
merge_dict(merged_list, source)
print(yaml.dump({'list': merged_list}), end='')
结果:
list:
- id: 1
name: name-from-B
- id: 2
name: name-from-A
title: title-from-B
- id: 3
name: name-from-B
title: title-from-B
答案 2 :(得分:0)
基于答案(谢谢大家),我创建了一个解决方案,以相当通用的方式处理我需要ATM的所有合并功能(我需要在许多不同类型的Kubernetes描述符中使用它)。 / p>
它基于Ruamel。
它处理多级列表,不仅管理按索引合并列表元素,而且还按适当的项目标识进行管理。
它比我希望的要复杂(它遍历YAML树)。
脚本和核心方法:
import ruamel.yaml
from ruamel.yaml.comments import CommentedMap, CommentedSeq
#
# Merges a node from B with its pair in A
#
# If the node exists in both A and B, it will merge
# all children in sync
#
# If the node only exists in A, it will do nothing.
#
# If the node only exists in B, it will add it to A and stops
#
# attrPath DOES NOT include attrName
#
def mergeAttribute(parentNodeA, nodeA, nodeB, attrName, attrPath):
# If both is None, there is nothing to merge
if (nodeA is None) and (nodeB is None):
return
# If NodeA is None but NodeB has value, we simply set it in A
if (nodeA is None) and (parentNodeA is not None):
parentNodeA[attrName] = nodeB
return
if attrPath == '':
attrPath = attrName
else:
attrPath = attrPath + '.' + attrName
if isinstance(nodeB, CommentedSeq):
# The attribute is a list, we need to merge specially
mergeList(nodeA, nodeB, attrPath)
elif isinstance(nodeB, CommentedMap):
# A simple object to be merged
mergeObject(nodeA, nodeB, attrPath)
else:
# Primitive type, simply overwrites
parentNodeA[attrName] = nodeB
#
# Lists object attributes and merges the attribute values if possible
#
def mergeObject(nodeA, nodeB, attrPath):
for attrName in nodeB:
subNodeA = None
if attrName in nodeA:
subNodeA = nodeA[attrName]
subNodeB = None
if attrName in nodeB:
subNodeB = nodeB[attrName]
mergeAttribute(nodeA, subNodeA, subNodeB, attrName, attrPath)
#
# Merges two lists by properly identifying each item in both lists
# (using the merge-directives).
#
# If an item of listB is identified in listA, it will be merged onto the item
# of listA
#
def mergeList(listA, listB, attrPath):
# Iterating the list from B
for itemInB in listB:
itemInA = findItemInList(listA, itemInB, attrPath)
if itemInA is None:
listA.append(itemInB)
continue
# Present in both, we need to merge them
mergeObject(itemInA, itemInB, attrPath)
#
# Finds an item in the list by using the appropriate ID field defined for that
# attribute-path.
#
# If there is no id attribute defined for the list, it returns None
#
def findItemInList(listA, itemB, attrPath):
if attrPath not in listsWithId:
# No id field defined for the list, only "dumb" merging is possible
return None
# Finding out the name of the id attribute in the list items
idAttrName = listsWithId[attrPath]
idB = None
if idAttrName is not None:
idB = itemB[idAttrName]
# Looking for the item by its ID
for itemA in listA:
idA = None
if idAttrName is not None:
idA = itemA[idAttrName]
if idA == idB:
return itemA
return None
# ------------------------------------------------------------------------------
yaml = ruamel.yaml.YAML()
# Load the merge directives
with open('merge-directives.yaml') as fp:
mergeDirectives = yaml.load(fp)
listsWithId = mergeDirectives['lists-with-id']
# Load the yaml files
with open('a.yaml') as fp:
dataA = yaml.load(fp)
with open('b.yaml') as fp:
dataB = yaml.load(fp)
mergeObject(dataA, dataB, '')
# create a new file with the merged yaml
yaml.dump(dataA, file('c.yaml', 'w'))
帮助程序配置文件(merge-directives.yaml),用于指示有关(甚至是多级)列表中元素的标识。
对于原始问题中的数据结构,仅需要'list:“ id”'配置条目,但我还包括其他一些键来演示用法。
#
# Lists that contain identifiable elements.
#
# Each sub-key is a property path denoting the list element in the YAML
# data structure.
#
# The value is the name of the attribute in the list element that
# identifies the list element so that pairing can be made.
#
lists-with-id:
list: "id"
list.sub-list: "id"
a.listAttrShared: "name"
尚未经过大量测试,但是这里有两个测试文件比原始问题中的测试更加完整。
a.yaml:
a:
attrShared: value-from-a
listAttrShared:
- name: a1
- name: a2
attrOfAOnly: value-from-a
list:
- id: 1
name: "name-from-A"
sub-list:
- id: s1
name: "name-from-A"
comments: "doesn't exist in B, so left untouched"
- id: s2
name: "name-from-A"
sub-list-with-no-identification:
- "comment 1"
- "comment 2"
- id: 2
name: "name-from-A"
b.yaml:
a:
attrShared: value-from-b
listAttrShared:
- name: b1
- name: b2
attrOfBOnly: value-from-b
list:
- id: 1
name: "name-from-B"
sub-list:
- id: s2
name: "name-from-B"
title: "title-from-B"
comments: "overwrites name in A with name in B + adds title from B"
- id: s3
name: "name-from-B"
comments: "only exists in B so added to A's list"
sub-list-with-no-identification:
- "comment 3"
- "comment 4"
- id: 2
title: "title-from-B"
- id: 3
name: "name-from-B"
title: "title-from-B"