Question

我正在寻找一种验证字典的解决方案，其中一个约束是is in约束，其中认为有效的值来自正在验证的字典本身。

例如，想象以下伪模式

{
    "notions" : [ string ],
    "category" : [ is in notions ]
}

非常清楚，我还口头表达了这种伪模式的约束。这些是我要验证的约束，请d作为要验证的字典：

set(d.keys()) == {"notions", "categories"}
isinstance(d["notions"], list)
isinstance(notion, str) for notion in d["notions"]
isinstance(d["category"], list)
element is in d["notion"] for element in d["category"]

不要问，这种特定的数据结构是否有意义。它不是。我只是编造了一个例子，以解决我的问题。我实际的字典模式要复杂得多，并且会对该字典中的值有多个引用。这就是为什么我要避免手动定义和验证约束，而希望使用基于架构的解决方案。

我研究了一些模式验证库，但没有发现此功能包含在任何地方。是否有基于某些库的解决方案，也许稍作调整？我宁愿不要第二次发明轮子。

Answer 1

通常，模式验证器尝试避免将数据拉入验证器。例如，JSON-schema standard是debating adding $data access in schemas，但尚未实现（即使他们有several use cases for it）。

一般的反对意见是，使验证模式依赖于要验证的数据将使保持验证不受上下文限制的困难（这使实现更加容易，并使并行验证变得更加容易），并且使对模式要困难得多（因为模式在运行时随数据而变化）。

也就是说，Colander project可以满足您的要求，因为它允许您在Python代码中轻松定义验证器。

例如：

import colander

class Foo(colander.MappingSchema):
    @colander.instantiate()
    class notions(colander.SequenceSchema):
        notion = colander.SchemaNode(colander.String())

    @colander.instantiate()
    class category(colander.SequenceSchema):
        cat = colander.SchemaNode(colander.String())

    def validator(self, node, cstruct):
        """Validate that all category values are listed in notions"""
        notions = set(cstruct['notions'])
        if not notions.issuperset(cstruct['category']):
            raise colander.Invalid(
                node['category'], 
                "All categories must be listed in notions"
            )

请注意，验证器是在定义notions和category的级别上定义的，因为验证器只能访问要验证的数据的“本地”部分（所有子节点节点验证尚未进行）。如果仅为category定义验证器，则无法访问notions列表，并且可以依靠已经验证过的notions列表。验证器会引发一个Invalid异常，第一个参数是category模式节点，以将责任完全归咎于该列表中的值。

Colander模式在反序列化时会进行验证；您可以看到Schema.deserialize()方法的输入为未经验证的数据（漏勺序列化），输出为应用程序就绪的数据（ appdata ），经过验证和清除起来这是因为Colander还将在缺少默认值的情况下放置默认值，可以生成元组，集合，datetime值等，并且在使用模式处理它时还支持数据准备（清理HTML等）。

通过一些演示输入，以上架构将验证并成功返回经过验证的结构：

>>> schema = Foo()
>>> schema.deserialize({'notions': [], 'category': []})
{'notions': [], 'category': []}
>>> schema.deserialize({'notions': ['foo', 'bar'], 'category': []})
{'notions': ['foo', 'bar'], 'category': []}
>>> schema.deserialize({'notions': ['foo', 'bar'], 'category': ['foo']})
{'notions': ['foo', 'bar'], 'category': ['foo']}
>>> schema.deserialize({'notions': ['foo', 'bar'], 'category': ['foo', 'spam']})
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/.../site-packages/colander/__init__.py", line 2381, in deserialize
    self.validator(self, appstruct)
  File "<string>", line 17, in validator
colander.Invalid: {'category': 'All categories must be listed in notions'}

Answer 2

您的词典太复杂了，您就完全错了。考虑创建类并将该类的对象存储在字典中。这些类还可以包含其他类的其他对象。这样，您将避免字典嵌套。在类中创建函数以验证其数据。

具有动态“处于”约束的Python词典架构验证

2 个答案: