如何遍历整个目录?

时间:2019-08-22 01:05:43

标签: python json

我正在尝试遍历包含4000个json文件的目录,以创建一个包含json文件所有元素的联合json文件。当我尝试这样做时,我只能得到大约一半的json文件要加入。如何确定所有json文件都经过迭代?

json_files = [x for x in os.listdir(profile_directory_1) if x.endswith('.json')]
company_profiles_1 = dict()
for json_file in json_files:
    json_file_path = os.path.join('some/path', json_file)
    with open(json_file_path, 'r', encoding='utf-8') as f:
        company_profiles_1.update(json.load(f))

我希望len(company_profiles_1)超过4000,因为该目录包含4000多个json文件,但我只有2161。

1 个答案:

答案 0 :(得分:1)

我一直在使用目录中的多个json文件,这就是我的解决方法!我使用了55000+个json文件,花了298秒浏览所有文件并创建了DataFrame。

import json
import pandas as pd
import os
import time
import numpy as np 

start_time = time.time()
d = {'date':[],'action':[],'account':[],'flag':[],'day':[],'month':[],'year':[],'reqid':[]}
for files in os.listdir('C:\\Users\\Username\\Documents\\Jsons'):
    x = 'C:\\Users\\Username\\Documents\\Jsons\\'+files
    with open(x, encoding="Latin-1") as w:
        data = json.load(w)
        for i in range(1,len(data['variables']['aer'])):
            d['date'].append(data['variables']['aer'][i]['date'])
            d['action'].append(data['variables']['aer'][i]['action'])
            d['account'].append(data['variables']['aer'][i]['account'])
            d['flag'].append(data['variables']['aer'][i]['flag'])
            d['day'].append(data['variables']['aer'][i]['day'])
            d['month'].append(data['variables']['aer'][i]['month'])
            d['year'].append(data['variables']['aer'][i]['year'])
            d['reqid'].append(data['reqid'])

此外,您可以添加try:except ValueError:except KeyError:,以获得更好的性能。

如果您要检查已遍历的json数量,则可以肯定创建一个包含文件的列表:

d = {'date':[],'action':[],'account':[],'flag':[],'day':[],'month':[],'year':[],'reqid':[]}
num_of_jsons = []
for files in os.listdir('C:\\Users\\Username\\Documents\\Jsons'):
    num_or_jsons.append(files)
    x = 'C:\\Users\\Username\\Documents\\Jsons\\'+files
    with open(x, encoding="Latin-1") as w:
        data = json.load(w)
        for i in range(1,len(data['variables']['aer'])):
            d['date'].append(data['variables']['aer'][i]['date'])
            d['action'].append(data['variables']['aer'][i]['action'])
            d['account'].append(data['variables']['aer'][i]['account'])
            d['flag'].append(data['variables']['aer'][i]['flag'])
            d['day'].append(data['variables']['aer'][i]['day'])
            d['month'].append(data['variables']['aer'][i]['month'])
            d['year'].append(data['variables']['aer'][i]['year'])
            d['reqid'].append(data['reqid'])