我在下面提到的文件夹结构中的不同文件夹中具有相同名称的json文件
folder1/
file1.json
file2.json
file3.json
folder2/
file1.json
file2.json
file3.json
file4.json
folder3/
file1.json
file2.json
file3.json
file4.json
file5.json
....
结合所有文件夹中可用的json文件以创建单个json文件的最佳方法是什么。 file1.json
中的键在
到目前为止,我可以想到以下方法,但是由于每个json文件约为5 MB,因此感觉很慢。
from pathlib import Path
output_dir = Path(location_of_output_folder)
output_dir.mkdir(parents=True, exist_ok=True)
# find all the folders
root_dir = Path(root_location_for_folders)
folders = [fld for fld in root_dir.iterdir() if fld.is_dir()]
# find all the unique file names
all_filenames = []
for fld in folders:
for f in fld.glob('*.json'):
all_filenames.append(f.name)
## Approach 1
# Join file that possibly exists across all the folders by creating empty list
for f in list(set(all_filenames)):
f_data = []
for fld in folders:
if (fld / f).is_file():
with open(fld /f, 'r') as fp:
f_data.append(json.load(fp))
with open(output_dir / f, 'w') as fp:
json.dump(f_data, fp, indent=4)
## Approach 2
# Join file that possibly exists across all the folders by creating empty dict
for f in list(set(all_filenames)):
f_data = {}
for fld in folders:
if (fld / f).is_file():
with open(fld /f, 'r') as fp:
f_data.update(json.load(fp))
with open(output_dir / f, 'w') as fp:
json.dump(f_data, fp, indent=4)
是否有更好(更快)的方法。我只担心时间而对pythonic解决方案不感兴趣
谢谢
更新#1:具有相同文件名的文件应合并。对不起,如果我不清楚。每个文件将只有几个与所有文件相似的键(l1, l2, l3, l4)
示例
a。 file1.json
中的folder1
的结构
{
k1: {
l1: 11,
l2: 12,
l3: 13,
l4: 14,
},
k2: {
l1: 21,
l2: 22,
l3: 23,
l4: 24,
}
.....
}
a。 file2.json
中的folder2
的结构
{
k8: {
l1: 41,
l2: 42,
l3: 43,
l4: 44,
},
k9: {
l1: 51,
l2: 52,
l3: 53,
l4: 54,
}
.....
}
答案 0 :(得分:1)
您无需解析输入的JSON文件,而只需将它们读取为文本文件即可,这会快得多(基本上每个文件一个系统调用)。然后,通过在每个文件内容的开头添加COPY ./package.json ./
,在末尾添加[
,并在每个文件内容之后添加]
,将它们组合为全局JSON列表。好的,这些行不会在0级列表中缩进,但是谁在乎呢?这是一个基本的实现:
,
请注意,此实现将输入文件一一存储在RAM中,因此与其他方法相反,很容易处理很长的文件列表。
最后一点:如果您确实要对所有内行进行缩进,则可以简单地逐行读取每个文件(在文件上使用infiles = [...] # the whole list of input JSON files
outfile = 'out.json'
with open(outfile,'w') as o:
o.write('[')
for infile in infiles[:-1]: # loop over all files except the last one
with open(infile,'r') as i:
o.write(i.read().strip() + ',\n')
with open(infiles[-1]) as i: # special treatement for last file
o.write(i.read().strip() + ']\n')
方法)并添加前缀
在输出文件上写入前减少4个空格。但是您会失去性能...
编辑:经过稍微修改的版本,具有更多的代码分解功能
readline()
答案 1 :(得分:0)
这是我能想到的最简单的代码:
from glob import glob
from os import makedirs, path
from pathlib import Path
import json
# Directories
input_dir = "in"
output_file = "out/out.json"
# Get array of files
files = glob(path.join(input_dir, "**", "*.json"))
# Data object
data = {}
# Merge all files
for file in files:
data.update(json.load(open(file)))
# Create output directory
makedirs(path.dirname(output_file), exist_ok=True)
# Dump data
json.dump(data, open(output_file, "w+"))
答案 2 :(得分:0)
编辑:我知道该解决方案不再符合要求,我将在短期内对其进行更新。
暂时不考虑这是否很重要的问题,这就是我的想法。
import glob
import json
file_names = glob.glob('../resources/json_files/*.json')
json_list = []
for curr_f_name in file_names:
with open(curr_f_name) as curr_f_obj:
json_list.append(json.load(curr_f_obj))
with open('../out/json_merge_out.json', 'w') as out_file:
json.dump(json_list, out_file, indent=4)
包含的JSON文件目录:
example_1.json
:
{
"fruit": "Apple",
"size": "Large",
"color": "Red"
}
example_2.json
:
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
}
}
输出文件json_merge_out.json
的内容:
[
{
"quiz": {
"sport": {
"q1": {
"question": "Which one is correct team name in NBA?",
"options": [
"New York Bulls",
"Los Angeles Kings",
"Golden State Warriros",
"Huston Rocket"
],
"answer": "Huston Rocket"
}
},
"maths": {
"q1": {
"question": "5 + 7 = ?",
"options": [
"10",
"11",
"12",
"13"
],
"answer": "12"
},
"q2": {
"question": "12 - 8 = ?",
"options": [
"1",
"2",
"3",
"4"
],
"answer": "4"
}
}
}
},
{
"fruit": "Apple",
"size": "Large",
"color": "Red"
}
]
答案 3 :(得分:-1)
如果您真的对时间感兴趣,可以直接转到C ++或C。就像@Barmar在评论中说的那样,我认为您可以对设置进行优化,因为您需要打开所有文件无论如何