Python - 递归收集文件信息导致内存错误

时间:2016-10-06 16:18:27

标签: python recursion

我正在编写一个脚本来处理起始文件夹的所有目录和子目录,但是我遇到了内存错误(错误是MemoryError)。我的猜测可能是我的data_dicts列表太大但我不确定。任何建议将不胜感激。

import os

# example data dictionary
data_dict = {
    'filename': 'data.csv',
    'folder':   'R:/',
    'size':     300000
}

def get_file_sizes_folder(data_dicts, starting_folder):
# Given a list of file information dictionaries and a folder, iterate over the files
# in the folder to get their information and append it to the list. 
# Also recurse through subdirectories
    for entry in os.scandir(starting_folder):
        if not entry.name.startswith('.'):
            if entry.is_file():
                size = entry.stat().st_size
                filename = entry.name
                folder = os.path.dirname(entry.path)
                temp_dict = {'filename': filename, 'size': size, 'folder': folder}
                data_dicts.append(temp_dict.copy())
            else:
                print(entry.path)
                data_dicts.extend(get_file_sizes_folder(data_dicts, entry.path))

    return data_dicts

d = get_file_sizes_folder([], 'R:/')    

2 个答案:

答案 0 :(得分:3)

您不应提供get_file_sizes_folder()作为您的函数starting_folder的参数。这样做会产生很多很多重复的条目,其速度可能几乎是因子的。难怪你的电脑很快耗尽内存!

相反,只使用data_dicts作为参数,只需在函数的第一行创建一个新列表def get_file_sizes_folder(starting_folder): # Given a list of file information dictionaries and a folder, iterate over the files # in the folder to get their information and append it to the list. # Also recurse through subdirectories data_dicts = [] for entry in os.scandir(starting_folder): if not entry.name.startswith('.'): if entry.is_file(): size = entry.stat().st_size filename = entry.name folder = os.path.dirname(entry.path) temp_dict = {'filename': filename, 'size': size, 'folder': folder} data_dicts.append(temp_dict) else: print(entry.path) data_dicts.extend(get_file_sizes_folder(entry.path)) return data_dicts ,如下所示:

JavaScriptObject

答案 1 :(得分:1)

你根本不应该进行递归。使用os.walk

示例:

def get_file_sizes_folder(starting_folder):
    data_dicts = list()
    for root, _, files in os.walk(starting_folder):
        data_dicts.extend({
            'filename': f, 
            'size': os.path.getsize(os.path.join(root, f)),
            'folder': root,
        } for f in files)

    return data_dicts

d = get_file_sizes_folder('R:/')