JSON到YAML:将带有嵌入换行符“\ n”的字段转换为文字块“|”

时间:2017-11-08 14:24:18

标签: json yaml file-conversion

如何转换JSON文件,其中一些字段值是多行字符串,嵌入换行符(如“\n”)到YAML,其中带有嵌入换行符的值和仅使用文字块表示法写入的值。< / p>

例如,给定以下JSON:

{
   "01ea672a": {
        "summary": "A short one-line summary",
        "description": "first line\nsecond line",
        "content": "1st line\n2nd line\n"
   }
}

应该生成类似下面的YAML(细节可能不同):

---
01ea672a:
  summary: A short one-line summary
  description: |-
    first line
    second line
  content: |
    1st line
    2nd line

我更喜欢脚本语言的解决方案,无论是Python,Perl,Ruby还是其他,或者使用像Catmandu这样的命令行转换工具。

json2yaml.com联机可以执行此操作,但我宁愿不尝试将其用于40 MB文件。

3 个答案:

答案 0 :(得分:3)

ruamel.yaml(免责声明:我是该库的作者),已经可以 往返您的预期输出而不会丢失任何信息 (包括键顺序):

import sys
import ruamel.yaml

yaml_str = """---
01ea672a:
  summary: A short one-line summary
  description: |-
    first line
    second line
  content: |
    1st line
    2nd line
"""

yaml = ruamel.yaml.YAML()
yaml.explicit_start = True
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)

给予:

---
01ea672a:
  summary: A short one-line summary
  description: |-
    first line
    second line
  content: |
    1st line
    2nd line

如果要添加:

print(type(data['01ea672a']['description']), type(data['01ea672a']))

您会发现这些是LiteralStringScalar的{​​{1}}。 ruamel.yaml.scalarstring中的CommentedMap。您可以随时创建后者 通过将类型交给JSON加载器,它将保留键顺序,因为它的行为就像一个orderdict。

前者 加载后必须“强制”执行,因为ruamel.yaml.comments中没有'parse_string'选项,在加载过程中也可以这样做。 json.loads具有实用工具功能ruamel.yaml

有了这些知识,就可以轻松地从JSON到YAML进行完全转换:

walk_tree

再次准确给出您期望的输出。

答案 1 :(得分:2)

您可以使用低级事件API来执行此操作。只需将JSON解析为YAML以获取事件流(YAML是JSON的超集允许),然后按以下方式修改事件:

  • 将其设为块样式事件(JAM样式在YAML中称为 flow-style )。
  • 如果是标量键,请使用plain-style。
  • 如果它是标量值,如果值包含换行符,则使其为文字样式,其他为普通样式。

最后,发出修改过的事件。这是PyYaml的解决方案:

import yaml, types
from yaml.events import *

events = []

class Level:
  def __init__(self, is_mapping):
    self.is_mapping = is_mapping
    self.is_value = True

levels = []

with open("in.json", 'r') as stream:
  for event in yaml.parse(stream):
    if len(levels) > 0 and levels[-1].is_mapping:
      levels[-1].is_value = not levels[-1].is_value
    if isinstance(event, yaml.CollectionStartEvent):
      levels.append(Level(isinstance(event, MappingStartEvent)))
      event.flow_style = False
    elif isinstance(event, CollectionEndEvent):
      levels.pop()
    elif isinstance(event, ScalarEvent):
      if len(levels) > 0 and levels[-1].is_value:
        event.style = '|' if "\n" in event.value else ''
      else:
        event.style = ''
      event.implicit = (True, True)
    events.append(event)

with open("out.yaml", 'w') as stream:
  yaml.emit(events, stream)

注意: PyYaml支持YAML 1.1,在某些边缘情况下,是JSON的超集。可以肯定的是,您可以使用ruamel代替实现YAML 1.2,但我不熟悉它的代码,这就是我提供PyYaml解决方案的原因。

答案 2 :(得分:0)

事实证明,我能够将dnozay answer修改为“Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?”问题。

事实证明它比flyx answer快一点,但你需要一些额外的技巧(借用drbild/json2yaml的修改来借用)来保留键的顺序。

主要部分是使用 Representer.add_representer

class maybe_literal_str(str): pass
class maybe_literal_unicode(unicode): pass

def change_maybe_style(representer):
    def new_maybe_representer(dumper, data):
        scalar = representer(dumper, data)
        if isinstance(data, basestring) and "\n" in data:
            scalar.style = '|'
        else:
            scalar.style = None
        return scalar
    return new_maybe_representer

from yaml.representer import SafeRepresenter

# represent_str does handle some corner cases, so use that
# instead of calling represent_scalar directly 
represent_maybe_literal_str     = change_maybe_style(SafeRepresenter.represent_str)
represent_maybe_literal_unicode = change_maybe_style(SafeRepresenter.represent_unicode)

# I needed to use it in yaml.safe_dump() with older PyYAML,
# hence explicit Dumper=yaml=SafeDumper
yaml.add_representer(maybe_literal_str, represent_maybe_literal_str,
                     Dumper=yaml.SafeDumper)
yaml.add_representer(maybe_literal_unicode, represent_maybe_literal_unicode,
                     Dumper=yaml.SafeDumper)

为了使它工作,我必须用这两个类中的一个来包装字符串:

def wrap_strings(arg):
    """Wrap {str,unicode} arguments in maybe_literal_{str,unicode}"""
    if isinstance(arg, str):
        return maybe_literal_str(arg)
    elif isinstance(arg, unicode):
        return maybe_literal_unicode(arg)
    else:
        return arg

我已经使用这个hacky函数来修改结构

def transform(obj, leaf_callback):
    try:
        # is it dict or something like it?
        enum = obj.iteritems()
    except AttributeError:
        # if not dict-like, it is list-like object
        enum = enumerate(obj)
    for k, v in enum:
        # is value 'v' collection or scalar (leaf value)?
        if isinstance(v, (dict, list)):
            transform(v, leaf_callback)
        else:
            newval = leaf_callback(v)
            if newval is not None:
                obj[k] = newval

从JSON到YAML的转换完成了:

def convert_dom(json_file, yaml_file):
    loaded_json = json.load(json_file)
    transform(loaded_json, wrap_strings)
    yaml.safe_dump(loaded_json, yaml_file,
                   explicit_start=True, # start with "---\n"
                   default_flow_style=False)


with open('in.json', 'r') as json_file:
    with open('out.yaml', 'w') as yaml_file:
        convert_events(json_file, yaml_file)