Question

我有几个大型json文件（6 + Gb），格式如下：

$ cat myfile.json
[
{
    "foo": "bar",
    "foobar": 2,
    ...
},
{
    "foo": "oof",
    "foobar": 4,
    ...
},
{
    ...
}
]

我需要编辑文件的每个项目（{ ... }）（主要是删除许多字段）并将其写入另一个文件。

如果我使用基本的json.loads("myfile.json")，我会得到MemroyError：

with open(file, "r") as f:
    data = json.load(f)

File "/usr/lib/python3.6/codecs.py", line 321, in decode
    (result, consumed) = self._buffer_decode(data, self.errors, final)
MemoryError

因此，我想将文件作为流打开，并按字典获取数据字典。

我可能无法以json格式打开文件，而是逐行读取它并解析结果以重新构建字典，但这似乎非常不合逻辑。

我可以使用ijson，这似乎是为此目的而设计的，但是我无法准确获得想要的内容（只能通过前缀而不是整个字典来获得）。可能是我听不懂...

我该怎么做？

打开一个巨大的json文件

0 个答案: