Question

我有一个从json字符串读取的嵌套结构，看起来类似于以下内容...

[
  {
    "id": 1,
    "type": "test",
    "sub_types": [
      {
        "id": "a",
        "type": "sub-test",
        "name": "test1"
      },
      {
        "id": "b",
        "name": "test2",
        "key_value_pairs": [
          {
            "key": 0,
            "value": "Zero"
          },
          {
            "key": 1,
            "value": "One"
          }
        ]
      }
    ]
  }
]

我需要提取并旋转数据，准备将其插入数据库...

[
  (1, "b", 0, "Zero"),
  (1, "b", 1, "One")
]

我正在做以下事情...

data_list = [
  (
    type['id'],
    sub_type['id'],
    key_value_pair['key'],
    key_value_pair['value']
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if 'key_value_pairs' in sub_type
  for key_value_pair in sub_type['key_value_pairs']
]

到目前为止，很好。

但是，接下来我需要执行一些约束。例如...

if type['type'] == 'test': raise ValueError('[test] types can not contain key_value_pairs.')

但是我不能理解这一点。而且我不想求助于循环。到目前为止，我最好的想法是...

def make_row(type, sub_type, key_value_pair):
    if type['type'] == 'test': raise ValueError('sub-types of a [test] type can not contain key_value_pairs.')
    return (
        type['id'],
        sub_type['id'],
        key_value_pair['key'],
        key_value_pair['value']
    )

data_list = [
  make_row(
    type,
    sub_type,
    key_value_pair
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if 'key_value_pairs' in sub_type
  for key_value_pair in sub_type['key_value_pairs']
]

这可行，但是它将对每个key_value_pair进行检查，这很多余。 （每组键值对可以有数千对，并且只需要进行一次检查就可以知道它们都很好。）

此外，还有其他与此类似的规则适用于层次结构的不同级别。例如“测试”类型只能包含“ sub_test”子类型。

除了上述选项以外，还有哪些选择？

更优雅吗？
更具扩展性吗？
表现更好？
更多“ Pythonic”吗？

Answer 1

您应该阅读有关如何验证json数据以及如何通过以下方式指定显式架构约束的信息： https://playground.jsreport.net/w/anon/BJa5OBWD-2 该库可让您设置所需的键，指定默认值，添加类型验证等。

此库具有python实现： JSON Schema

示例：

from jsonschema import Draft6Validator

schema = {
    "$schema": "https://json-schema.org/schema#",

    "type": "object",
    "properties": {
        "name": {"type": "string"},
        "email": {"type": "string"},
    },
    "required": ["email"]
}
Draft6Validator.check_schema(schema)

Answer 2

我只使用普通循环，但是如果将语句放入函数中，则可以将其添加到第一个条件检查中：

def type_check(type):
    if type['type'] == 'test':
        raise ValueError('sub-types of a [test] type can not contain key_value_pairs.')
    return True


data_list = [
  (
    type['id'],
    sub_type['id'],
    key_value_pair['key'],
    key_value_pair['value']
  )
  for type in my_parsed_json_array
  if 'sub_types' in type
  for sub_type in type['sub_types']
  if  'key_value_pairs' in sub_type and type_check(type)
  for key_value_pair in sub_type['key_value_pairs']
]

Answer 3

您可以按照以下方式尝试构架

def validate_top(obj):
    if obj['type'] in BAD_TYPES:
        raise ValueError("oof")
    elif obj['type'] not in IRRELEVANT_TYPES: # actually need to include this
        yield obj

def validate_middle(obj):
    # similarly for the next nested level of data

# and so on

[
    make_row(r)
    for t in validate_top(my_json)
    for m in validate_middle(t)
    # etc...
    for r in validate_last(whatever)
]

我在这里使用的一般模式是使用生成器（函数，而不是表达式）处理数据，然后理解收集数据。

在较简单的情况下，没有必要分离出多个处理级别（或者它们不自然存在），您仍然可以编写单个生成器，然后执行类似list(generator(source))的操作。在我看来，这比使用普通功能和手动构建列表还要干净-它仍将“处理”与“收集”问题分开。

Python在列表理解内引发错误（或更好的替代方法）

3 个答案: