PyYAML将字符串解释为时间戳

时间:2015-03-03 15:14:30

标签: python pyyaml

看起来好像PyYAML将字符串10:01解释为以秒为单位的持续时间:

import yaml
>>> yaml.load("time: 10:01")
{'time': 601}

官方文档没有反映出:PyYAML documentation

有关如何将10:01作为字符串阅读的任何建议吗?

3 个答案:

答案 0 :(得分:4)

把它放在引号中:

>>> import yaml
>>> yaml.load('time: "10:01"')
{'time': '10:01'}

这告诉YAML它是一个文字字符串,并禁止尝试将其视为数值。

答案 1 :(得分:1)

由于您正在为YAML 1.1使用解析器,因此您应该期望实现specification(示例2.19)中指示的内容:

sexagesimal: 3:25:45

进一步解释了性别动物here

  

使用“:”允许在基数60中表示整数,这对于时间和角度值是方便的。

并非PyYAML中实现的每个细节都在您引用的文档中,您只应将其视为简介。

你不是唯一一个发现这种解释令人困惑的人,而在YAML 1.2中,性别动词从specification中删除。尽管该规范已经出现了大约八年,但这些变化从未在PyYAML中实现过。

解决此问题的最简单方法是升级到ruamel.yaml(免责声明:我是该软件包的作者),您将获得YAML 1.2行为(除非您明确指定要使用YAML 1.1)将10:01解释为字符串:

from ruamel import yaml

import warnings
warnings.simplefilter('ignore', yaml.error.UnsafeLoaderWarning)

data = yaml.load("time: 10:01")
print(data)

给出:

{'time': '10:01'}

仅需要使用warnings.filter,因为您使用的是.load()而不是.safe_load()。前者是不安全并且可能导致擦除磁盘,或者更糟糕的是,当用于不受控制的YAML输入时。很少有理由不使用.safe_load()

答案 2 :(得分:0)

如果您希望monkeypatch pyyaml库,因此它没有这种行为(因为没有简洁的方法可以做到这一点),对于您选择的解析器,下面的代码可以工作。问题是the regex that is used for int includes some code to match timestamps即使看起来没有这种行为的规范,它只是被认为是一种良好的做法"将30:0040:11:11:11:11等字符串视为整数。

import yaml
import re

def partition_list(somelist, predicate):
    truelist = []
    falselist = []
    for item in somelist:
        if predicate(item):
            truelist.append(item)
        else:
            falselist.append(item)
    return truelist, falselist

@classmethod
def init_implicit_resolvers(cls):
    """ 
    creates own copy of yaml_implicit_resolvers from superclass
    code taken from add_implicit_resolvers; this should be refactored elsewhere
    """
    if not 'yaml_implicit_resolvers' in cls.__dict__:
        implicit_resolvers = {}
        for key in cls.yaml_implicit_resolvers:
            implicit_resolvers[key] = cls.yaml_implicit_resolvers[key][:]
        cls.yaml_implicit_resolvers = implicit_resolvers

@classmethod
def remove_implicit_resolver(cls, tag, verbose=False):
    cls.init_implicit_resolvers()
    removed = {}
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vremoved, v2 = partition_list(v, lambda x: x[0] == tag)
        if vremoved:
            cls.yaml_implicit_resolvers[key] = v2
            removed[key] = vremoved
    return removed

@classmethod
def _monkeypatch_fix_int_no_timestamp(cls):
    bad = '|[-+]?[1-9][0-9_]*(?::[0-5]?[0-9])+'
    for key in cls.yaml_implicit_resolvers:
        v = cls.yaml_implicit_resolvers[key]
        vcopy = v[:]
        n = 0
        for k in xrange(len(v)):
            if v[k][0] == 'tag:yaml.org,2002:int' and bad in v[k][1].pattern:
                n += 1
                p = v[k][1]
                p2 = re.compile(p.pattern.replace(bad,''), p.flags)
                vcopy[k] = (v[k][0], p2)    
        if n > 0:
            cls.yaml_implicit_resolvers[key] = vcopy

yaml.resolver.Resolver.init_implicit_resolvers = init_implicit_resolvers
yaml.resolver.Resolver.remove_implicit_resolver = remove_implicit_resolver
yaml.resolver.Resolver._monkeypatch_fix_int_no_timestamp = _monkeypatch_fix_int_no_timestamp

然后,如果你这样做:

class MyResolver(yaml.resolver.Resolver):
    pass

t1 = MyResolver.remove_implicit_resolver('tag:yaml.org,2002:timestamp')
MyResolver._monkeypatch_fix_int_no_timestamp()

class MyLoader(yaml.SafeLoader, MyResolver):
    pass

text = '''
a: 3
b: 30:00
c: 30z
d: 40:11:11:11
'''

print yaml.safe_load(text)
print yaml.load(text, Loader=MyLoader)

然后打印

{'a': 3, 'c': '30z', 'b': 1800, 'd': 8680271}
{'a': 3, 'c': '30z', 'b': '30:00', 'd': '40:11:11:11'}

显示默认的yaml行为保持不变,但是您的私有加载器类正确处理这些字符串。