我正在尝试遍历包含4000个json文件的目录,以创建一个包含json文件所有元素的联合json文件。当我尝试这样做时,我只能得到大约一半的json文件要加入。如何确定所有json文件都经过迭代?
json_files = [x for x in os.listdir(profile_directory_1) if x.endswith('.json')]
company_profiles_1 = dict()
for json_file in json_files:
json_file_path = os.path.join('some/path', json_file)
with open(json_file_path, 'r', encoding='utf-8') as f:
company_profiles_1.update(json.load(f))
我希望len(company_profiles_1)超过4000,因为该目录包含4000多个json文件,但我只有2161。
答案 0 :(得分:1)
我一直在使用目录中的多个json文件,这就是我的解决方法!我使用了55000+个json文件,花了298秒浏览所有文件并创建了DataFrame。
import json
import pandas as pd
import os
import time
import numpy as np
start_time = time.time()
d = {'date':[],'action':[],'account':[],'flag':[],'day':[],'month':[],'year':[],'reqid':[]}
for files in os.listdir('C:\\Users\\Username\\Documents\\Jsons'):
x = 'C:\\Users\\Username\\Documents\\Jsons\\'+files
with open(x, encoding="Latin-1") as w:
data = json.load(w)
for i in range(1,len(data['variables']['aer'])):
d['date'].append(data['variables']['aer'][i]['date'])
d['action'].append(data['variables']['aer'][i]['action'])
d['account'].append(data['variables']['aer'][i]['account'])
d['flag'].append(data['variables']['aer'][i]['flag'])
d['day'].append(data['variables']['aer'][i]['day'])
d['month'].append(data['variables']['aer'][i]['month'])
d['year'].append(data['variables']['aer'][i]['year'])
d['reqid'].append(data['reqid'])
此外,您可以添加try:
,except ValueError:
和except KeyError:
,以获得更好的性能。
如果您要检查已遍历的json数量,则可以肯定创建一个包含文件的列表:
d = {'date':[],'action':[],'account':[],'flag':[],'day':[],'month':[],'year':[],'reqid':[]}
num_of_jsons = []
for files in os.listdir('C:\\Users\\Username\\Documents\\Jsons'):
num_or_jsons.append(files)
x = 'C:\\Users\\Username\\Documents\\Jsons\\'+files
with open(x, encoding="Latin-1") as w:
data = json.load(w)
for i in range(1,len(data['variables']['aer'])):
d['date'].append(data['variables']['aer'][i]['date'])
d['action'].append(data['variables']['aer'][i]['action'])
d['account'].append(data['variables']['aer'][i]['account'])
d['flag'].append(data['variables']['aer'][i]['flag'])
d['day'].append(data['variables']['aer'][i]['day'])
d['month'].append(data['variables']['aer'][i]['month'])
d['year'].append(data['variables']['aer'][i]['year'])
d['reqid'].append(data['reqid'])