将嵌套的 JSON 字典列表写入 CSV

时间:2021-05-27 12:30:16

标签: json python-3.x csv

问题

我正在尝试编写以下嵌套的字典列表,其中包含另一个到 csv 的字典列表。我尝试了多种方法,但无法正确编写它:

Json 数据

[
    {
        "Basic_Information_Source": [
            {
                "Image": "image1.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277274
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image1_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277539
            }
        ],
        "Values": [
            {
                "Value1": 75.05045463635267,
                "Value2": 0.006097560975609756,
                "Value3": 0.045083481733371615,
                "Value4": 0.008639858263904898
            }
        ]
    },
    {
        "Basic_Information_Source": [
            {
                "Image": "image2.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1786254
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image2_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1782197
            }
        ],
        "Values": [
            {
                "Value1": 85.52662890580055,
                "Value2": 0.0005464352720450282,
                "Value3": 0.013496113910369758,
                "Value4": 0.003800236380811839
            }
        ]
    }
]

工作代码

我尝试使用以下代码并且它有效,但它只保存标题,然后将所有底层列表作为文本转储到 csv 文件中:

import json
import csv

def Convert_CSV():

    ar_enc_file = open('analysis_results_enc.json','r')
    json_data = json.load(ar_enc_file)

    keys = json_data[0].keys()
    
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(json_data)

    ar_enc_file.close()

Convert_CSV()

工作输出/问题

输出写入以下标题:

  • Basic_Information_Source
  • Basic_Information_Destination
  • 价值观

然后它将每个标头中的所有其他数据转储为如下列表:

[{'Image': 'image1.png', 'Image_Format': 'PNG', 'Image_Mode': 'RGB', 'Image_Width': 574, 'Image_Height': 262, 'Image_Size': 277274}]

预期输出/样本

Expected Output

尝试为字典数组中的每个字典生成上述类型的输出。

怎么写才合适?

1 个答案:

答案 0 :(得分:1)

我相信有人会提出更优雅的解决方案。话虽如此:

您遇到了一些问题。

  • 您的条目与要对齐的字段不一致。
  • 即使您填充数据,您也有需要展平的中间 list
  • 那么您仍然有需要合并在一起的分离数据。
  • DictWriter AFAIK 期望它是 [{'column': 'entry'},{'column': 'entry'} 格式的数据,因此即使您执行了前面的所有步骤,您仍然没有采用正确的格式。

那么让我们开始吧。

对于前两部分,我们可以结合起来。

def pad_list(lst, size, padding=None):
    # we wouldn't have to make a copy but I prefer to
    # avoid the possibility of getting bitten by mutability
    _lst = lst[:]
    for _ in range(len(lst), size):
        _lst.append(padding)
    return _lst


# this expects already parsed json data
def flatten(json_data):
    lst = []
    for dct in json_data:
        # here we're just setting a max size of all dict entries
        # this is in case the shorter entry is in the first iteration
        max_size = 0
        # we initialize a dict for each of the list entries
        # this is in case you have inconsistent lengths between lists
        flattened = dict()
        for k, v in dct.items():
            entries = list(next(iter(v), dict()).values())
            flattened[k] = entries
            max_size = max(len(entries), max_size)
        # here we append the padded version of the keys for the dict
        lst.append({k: pad_list(v, max_size) for k, v in flattened.items()})
    return lst

所以现在我们有一个扁平的 dict 列表,其值是具有一致长度的 list。本质上:

[
    {
        "Basic_Information_Source": [
            "image1.png",
            "PNG",
            "RGB",
            574,
            262,
            277274
        ],
        "Basic_Information_Destination": [
            "image1_dst.png",
            "PNG",
            "RGB",
            574,
            262,
            277539
        ],
        "Values": [
            75.05045463635267,
            0.006097560975609756,
            0.045083481733371615,
            0.008639858263904898,
            None,
            None
        ]
    }
]

但是这个 list 有多个 dict 需要合并,而不是一个。

所以我们需要合并。

# this should be self explanatory
def merge(flattened):
    merged = dict()
    for dct in flattened:
        for k, v in dct.items():
            if k not in merged:
                merged[k] = []
            merged[k].extend(v)
    return merged

这给了我们一些接近于此的东西:

{
    "Basic_Information_Source": [
        "image1.png",
        "PNG",
        "RGB",
        574,
        262,
        277274,
        "image2.png",
        "PNG",
        "RGB",
        1600,
        1066,
        1786254
    ],
    "Basic_Information_Destination": [
        "image1_dst.png",
        "PNG",
        "RGB",
        574,
        262,
        277539,
        "image2_dst.png",
        "PNG",
        "RGB",
        1600,
        1066,
        1782197
    ],
    "Values": [
        75.05045463635267,
        0.006097560975609756,
        0.045083481733371615,
        0.008639858263904898,
        None,
        None,
        85.52662890580055,
        0.0005464352720450282,
        0.013496113910369758,
        0.003800236380811839,
        None,
        None
    ]
}

但是等等,我们仍然需要为作者格式化它。

我们的数据需要采用 [{'column_1': 'entry', column_2: 'entry'},{'column_1': 'entry', column_2: 'entry'}

格式

所以我们格式化:

def format_for_writer(merged):
    formatted = []
    for k, v in merged.items():
        for i, item in enumerate(v):
            # on the first pass this will append an empty dict
            # on subsequent passes it will be ignored
            # and add keys into the existing dict
            if i >= len(formatted):
                formatted.append(dict())
            formatted[i][k] = item
    return formatted

最后,我们有一个漂亮干净的格式化数据结构,我们可以将其交给我们的 writer 函数。

def convert_csv(formatted):
    keys = formatted[0].keys()
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(formatted)

带有 json 字符串的完整代码:

import json
import csv

json_raw = """\
[
    {
        "Basic_Information_Source": [
            {
                "Image": "image1.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277274
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image1_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 574,
                "Image_Height": 262,
                "Image_Size": 277539
            }
        ],
        "Values": [
            {
                "Value1": 75.05045463635267,
                "Value2": 0.006097560975609756,
                "Value3": 0.045083481733371615,
                "Value4": 0.008639858263904898
            }
        ]
    },
    {
        "Basic_Information_Source": [
            {
                "Image": "image2.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1786254
            }
        ],
        "Basic_Information_Destination": [
            {
                "Image": "image2_dst.png",
                "Image_Format": "PNG",
                "Image_Mode": "RGB",
                "Image_Width": 1600,
                "Image_Height": 1066,
                "Image_Size": 1782197
            }
        ],
        "Values": [
            {
                "Value1": 85.52662890580055,
                "Value2": 0.0005464352720450282,
                "Value3": 0.013496113910369758,
                "Value4": 0.003800236380811839
            }
        ]
    }
]
"""


def pad_list(lst, size, padding=None):
    _lst = lst[:]
    for _ in range(len(lst), size):
        _lst.append(padding)
    return _lst


def flatten(json_data):
    lst = []
    for dct in json_data:
        max_size = 0
        flattened = dict()
        for k, v in dct.items():
            entries = list(next(iter(v), dict()).values())
            flattened[k] = entries
            max_size = max(len(entries), max_size)
        lst.append({k: pad_list(v, max_size) for k, v in flattened.items()})
    return lst


def merge(flattened):
    merged = dict()
    for dct in flattened:
        for k, v in dct.items():
            if k not in merged:
                merged[k] = []
            merged[k].extend(v)
    return merged


def format_for_writer(merged):
    formatted = []
    for k, v in merged.items():
        for i, item in enumerate(v):
            if i >= len(formatted):
                formatted.append(dict())
            formatted[i][k] = item
    return formatted


def convert_csv(formatted):
    keys = formatted[0].keys()
    with open('test.csv', 'w', encoding='utf8', newline='')  as output_file:
        dict_writer = csv.DictWriter(output_file, keys)
        dict_writer.writeheader()
        dict_writer.writerows(formatted)


def main():
    json_data = json.loads(json_raw)
    flattened = flatten(json_data)
    merged = merge(flattened)
    formatted = format_for_writer(merged)
    convert_csv(formatted)


if __name__ == '__main__':
    main()