Python YAML转储特殊字符和多行

时间:2018-06-19 09:37:46

标签: python yaml pyyaml ruamel.yaml

我有一个my_yaml.yml文件,其中包含以下内容:

my_yaml:
  person: >
    John|Doe|48,
    Jack|Black|39
  skills:
    - name: superhero
      abilities:
        - swim
        - run
  special_chars:
    - '! | " "'
    - '+ | " "'
    - '\ | " "'
    - 'Á | "A"'
    - 'É | "E"'
    - 'Ű | "U"'
    - 'Û | "U"'

我想加载它然后转储到具有完全相同格式的my_yaml_new.yml文件中。原始输入文件中的字符有。我的代码是:

import yaml
my_yaml = yaml.load(open('my_yaml.yml', encoding='utf8'))  # without "utf8" encoding I get "'charmap' codec can't decode byte..." error

我可以dump进入控制台,但 1) abilities& name已更改:(

yaml.dump(my_yaml, default_flow_style=False, allow_unicode=True)

结果是:

'my_yaml:\n  person: >\n    John|Doe|48, Jack|Black|39\n  skills:\n  - abilities:\n    - swim\n    - run\n    name: superhero\n  special_chars:\n  - \'! | " "\'\n  - + | " "\n  - \\ | " "\n  - Á | "A"\n  - É | "E"\n  - Ű | "U"\n  - Û | "U"\n'

当我尝试转储到文件中时:

with open('my_yaml_new.yml', 'w') as outfile:
    yaml.dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)

2)由于字符Û,我收到以下错误:

  

UnicodeEncodeError:' charmap'编解码器不能对字符' \ xdb'进行编码。在   位置0:字符映射到未定义

如果我从输入my_yaml.yml文件中删除此行,则上面的转储成功,但 3) person字符串处的多行变为一行:(< / p>

my_yaml:
  person: >
    John|Doe|48, Jack|Black|39
  skills:
  - abilities:
    - swim
    - run
    name: superhero
  special_chars:
  - '! | " "'
  - + | " "
  - \ | " "
  - Á | "A"
  - É | "E"
  - Ű | "U"

4)我的单引号(&#39;)也从special_chars消失了:(

5)并注意skills的元素没有缩进:(

我尝试了these解决方案但没有成功。并且import ruamel.yaml as yaml也没有帮助。

更新

好的,以下精彩套餐解决了问题 1)&amp; 4),我可以在多行值上将>替换为|,因此 3)也会得到解决。也许 5)不是一个大问题。但是我仍然在努力处理ÛǗ这样的特殊字符,所以我仍然在寻找问题的解决方案 2) ......

from ruamel import yaml

    my_yaml = yaml.round_trip_load(open('dmy_yaml.yml', encoding='utf8'), preserve_quotes=True)
    with open('my_yaml_new.yml', 'w') as outfile:
        yaml.round_trip_dump(my_yaml, outfile, default_flow_style=False, allow_unicode=True)

1 个答案:

答案 0 :(得分:1)

我不确定您为什么会遇到unicode问题。如果您拥有my_yaml.yml和程序try.py

import sys
import ruamel.yaml

with open('my_yaml.yml') as fp:
    yaml_str = fp.read().replace(': >\n', ': |\n')

yaml = ruamel.yaml.YAML()
yaml.indent(mapping=2, sequence=4, offset=2)
yaml.preserve_quotes = True
data = yaml.load(yaml_str)
new_file = 'my_yaml_new.yml'
with open(new_file, 'w') as ofp:
    yaml.dump(data, ofp)

然后产生:

my_yaml:
  person: |
    John|Doe|48,
    Jack|Black|39
  skills:
    - name: superhero
      abilities:
        - swim
        - run
  special_chars:
    - '! | " "'
    - '+ | " "'
    - '\ | " "'
    - 'Á | "A"'
    - 'É | "E"'
    - 'Ű | "U"'
    - 'Û | "U"'

在适用于Python2和Python3的虚拟环境中,带有ruamel.yaml 0.15.40。

我用过:

for n in 2 3 ; do  mktmpenv -p /opt/python/$n/bin/python -qq -i ruamel.yaml; python --version; python try.py; deactivate; done

哪个当然依赖于/opt/python/2下安装的Python 2和3的(最新)版本。 /opt/python/3(它们在我的Linux开发系统上)。

请注意,Unicode没问题,yaml.indent(mapping=2, sequence=4, offset=2)保留了源缩进,但是您仍然需要将折叠的多行标量更改为文字样式(我在读入{{1}时会这样做) }),因为ruamel.yaml不支持保留它(主要是因为没有简单的方法可以透明地指示原始折叠点)。