Question

我有一个非常大的JSON对象，需要将其拆分为较小的对象并将这些较小的对象写入文件。

样本数据

raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

所需的输出（在此示例中，将数据分成两半）

output_file1.json = [{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202}]

output_file2.json = [{"id":"222","num":"4182","count":12}{"id":"33333","num":"5182","count":12}]

当前代码

import pandas as pd
import itertools
import json
from itertools import zip_longest


def grouper(iterable, n, fillvalue=None):
    args = [iter(iterable)] * n
    return zip_longest(fillvalue=fillvalue, *args)

    raw = '[{"id":"1","num":"2182","count":-17}{"id":"111","num":"3182","count":-202}{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

#split the data into manageable chunks + write to files

for i, group in enumerate(grouper(raw, 4)):
    with open('outputbatch_{}.json'.format(i), 'w') as outputfile:
        json.dump(list(group), outputfile)

第一个文件“ outputbatch_0.json”的

当前输出

["[", "{", "\"", "s"]

我觉得我正在做这件事，比原本要难得多。

Answer 1

假设原始文件应该是有效的json字符串（我包括了缺少的逗号），这是一个简单但可行的解决方案。

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'
json_data = json.loads(raw)

def split_in_files(json_data, amount):
    step = len(json_data) // amount
    pos = 0
    for i in range(amount - 1):
        with open('output_file{}.json'.format(i+1), 'w') as file:
            json.dump(json_data[pos:pos+step], file)
            pos += step
    # last one
    with open('output_file{}.json'.format(amount), 'w') as file:
        json.dump(json_data[pos:], file)

split_in_files(json_data, 2)

Answer 2

如果raw是有效的json。保存部分未详述。

import json

raw = '[{"id":"1","num":"2182","count":-17},{"id":"111","num":"3182","count":-202},{"id":"222","num":"4182","count":12},{"id":"33333","num":"5182","count":12}]'

raw_list = eval(raw)
raw__zipped = list(zip(raw_list[0::2], raw_list[1::2]))

for item in raw__zipped:
    with open('a.json', 'w') as f:
        json.dump(item, f)

Answer 3

如果您只需要一半的数据，则可以使用切片：

You do not have permission to access app

在共享代码中，基本情况不会像长度为奇数之类的情况那样处理，简而言之，您可以完成列表所能做的一切。

如何使用Python将JSON对象解析为较小的对象？

3 个答案: