如何转换JSON文件,其中一些字段值是多行字符串,嵌入换行符(如“\n
”)到YAML,其中带有嵌入换行符的值和仅使用文字块表示法写入的值。< / p>
例如,给定以下JSON:
{
"01ea672a": {
"summary": "A short one-line summary",
"description": "first line\nsecond line",
"content": "1st line\n2nd line\n"
}
}
应该生成类似下面的YAML(细节可能不同):
---
01ea672a:
summary: A short one-line summary
description: |-
first line
second line
content: |
1st line
2nd line
我更喜欢脚本语言的解决方案,无论是Python,Perl,Ruby还是其他,或者使用像Catmandu这样的命令行转换工具。
json2yaml.com联机可以执行此操作,但我宁愿不尝试将其用于40 MB文件。
答案 0 :(得分:3)
ruamel.yaml(免责声明:我是该库的作者),已经可以 往返您的预期输出而不会丢失任何信息 (包括键顺序):
import sys
import ruamel.yaml
yaml_str = """---
01ea672a:
summary: A short one-line summary
description: |-
first line
second line
content: |
1st line
2nd line
"""
yaml = ruamel.yaml.YAML()
yaml.explicit_start = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
给予:
---
01ea672a:
summary: A short one-line summary
description: |-
first line
second line
content: |
1st line
2nd line
如果要添加:
print(type(data['01ea672a']['description']), type(data['01ea672a']))
您会发现这些是LiteralStringScalar
的{{1}}。
ruamel.yaml.scalarstring
中的CommentedMap
。您可以随时创建后者
通过将类型交给JSON加载器,它将保留键顺序,因为它的行为就像一个orderdict。
前者
加载后必须“强制”执行,因为ruamel.yaml.comments
中没有'parse_string'选项,在加载过程中也可以这样做。
json.loads
具有实用工具功能ruamel.yaml
。
有了这些知识,就可以轻松地从JSON到YAML进行完全转换:
walk_tree
再次准确给出您期望的输出。
答案 1 :(得分:2)
您可以使用低级事件API来执行此操作。只需将JSON解析为YAML以获取事件流(YAML是JSON的超集允许),然后按以下方式修改事件:
最后,发出修改过的事件。这是PyYaml的解决方案:
import yaml, types
from yaml.events import *
events = []
class Level:
def __init__(self, is_mapping):
self.is_mapping = is_mapping
self.is_value = True
levels = []
with open("in.json", 'r') as stream:
for event in yaml.parse(stream):
if len(levels) > 0 and levels[-1].is_mapping:
levels[-1].is_value = not levels[-1].is_value
if isinstance(event, yaml.CollectionStartEvent):
levels.append(Level(isinstance(event, MappingStartEvent)))
event.flow_style = False
elif isinstance(event, CollectionEndEvent):
levels.pop()
elif isinstance(event, ScalarEvent):
if len(levels) > 0 and levels[-1].is_value:
event.style = '|' if "\n" in event.value else ''
else:
event.style = ''
event.implicit = (True, True)
events.append(event)
with open("out.yaml", 'w') as stream:
yaml.emit(events, stream)
注意: PyYaml支持YAML 1.1,在某些边缘情况下,不是JSON的超集。可以肯定的是,您可以使用ruamel代替实现YAML 1.2,但我不熟悉它的代码,这就是我提供PyYaml解决方案的原因。
答案 2 :(得分:0)
事实证明,我能够将dnozay answer修改为“Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?”问题。
事实证明它比flyx answer快一点,但你需要一些额外的技巧(借用drbild/json2yaml的修改来借用)来保留键的顺序。
主要部分是使用 Representer.add_representer
:
class maybe_literal_str(str): pass
class maybe_literal_unicode(unicode): pass
def change_maybe_style(representer):
def new_maybe_representer(dumper, data):
scalar = representer(dumper, data)
if isinstance(data, basestring) and "\n" in data:
scalar.style = '|'
else:
scalar.style = None
return scalar
return new_maybe_representer
from yaml.representer import SafeRepresenter
# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly
represent_maybe_literal_str = change_maybe_style(SafeRepresenter.represent_str)
represent_maybe_literal_unicode = change_maybe_style(SafeRepresenter.represent_unicode)
# I needed to use it in yaml.safe_dump() with older PyYAML,
# hence explicit Dumper=yaml=SafeDumper
yaml.add_representer(maybe_literal_str, represent_maybe_literal_str,
Dumper=yaml.SafeDumper)
yaml.add_representer(maybe_literal_unicode, represent_maybe_literal_unicode,
Dumper=yaml.SafeDumper)
为了使它工作,我必须用这两个类中的一个来包装字符串:
def wrap_strings(arg):
"""Wrap {str,unicode} arguments in maybe_literal_{str,unicode}"""
if isinstance(arg, str):
return maybe_literal_str(arg)
elif isinstance(arg, unicode):
return maybe_literal_unicode(arg)
else:
return arg
我已经使用这个hacky函数来修改结构
def transform(obj, leaf_callback):
try:
# is it dict or something like it?
enum = obj.iteritems()
except AttributeError:
# if not dict-like, it is list-like object
enum = enumerate(obj)
for k, v in enum:
# is value 'v' collection or scalar (leaf value)?
if isinstance(v, (dict, list)):
transform(v, leaf_callback)
else:
newval = leaf_callback(v)
if newval is not None:
obj[k] = newval
从JSON到YAML的转换完成了:
def convert_dom(json_file, yaml_file):
loaded_json = json.load(json_file)
transform(loaded_json, wrap_strings)
yaml.safe_dump(loaded_json, yaml_file,
explicit_start=True, # start with "---\n"
default_flow_style=False)
with open('in.json', 'r') as json_file:
with open('out.yaml', 'w') as yaml_file:
convert_events(json_file, yaml_file)